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S C^"^) Abstract: The invention relates to novel polynucleotides associated with human disease, and in particular to osteoarthritis. The 
invention further relates to polymorphic polynucleotides associated with osteoarthritis. The invention provides methods of determin- 
Q ing if a particular polymorphism predisposes an individual to or is associated with the development of osteoarthritis. The invention 
^ also provides methods of detecting the presence of one or more polymorphism as an indicator of osteoarthritis, and provides for use 
»^ of novel polynucleotides of the imrention in the development of drugs and in disease treatment. 
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NUCLEOTIDE POLYMORPHISMS ASSOCIATED WITH OSTEOARTHIRTIS 

TECHNICAL FIELD 

The inventioa relates in general to polymorphisms in genes associated with osteoarthritis and 
bone remodeling and methods of identifying individuals having a gene containing a polymorphism 
associated with osteoarfhritis. The invention also relates to a method of detecting an increases 
susceptibility to a disease in an individual resulting from the presence, of a polymorphism or mutation in 
the gene coding sequence of a osteoarthritis and bone remodeling associated gene. 

BACKGROUND OF THE INVENTION 
Single nucleotide substitutions and small unique insertions and deletions are the most frequent 
form of DNA polymorphism and disease-causing mutation in the human genome. These DNA 
sequence variations, called single nucleotide polymorphisms (SNPs), have gained popularity and have 
been proposed as the genetic markers of choice for the study of complex genetic traits (CoUios et al. 
1997 Science 278: 1580- 1581; Risch and Merkangas 1996 Science 273: 1516-1517). Despite the fact 
that on average approximately one nucleotide position in every 1000 bases along the human 
chromosome is estimated to differ between any two copies of the chromosome (Cooper et al. 1985 
Human Genetics 69: 201-205; Kwok et al. 1996 Genomics 31: 123-126) developing SNP markers is 
not easy. 

It has been suggested that association studies (such as linkage equilibrium studies) with a set 
of single nucleotide polymorphism (SNP) markers evenly spaced across the genome at approximately 
100 KB intervals would provide the necessary power to detect the smaE effects of each gene involved 
in a complex trait (Ifeuser et al. 1996 Genetic Epidemiology 13: 117-137 in Kwok and Chen 1998 
Genetic En eineerinfr 20: 125-134, Plenum Press, New York). Alternatively, one can take a candidate 
gene approach in perforining association studies with the use of a set of gene-associated SNP 
markers to detect these genetic factors (ibid.). 

Nucleotide sequence mutations which occur in a gene or gene family, where the gene or gene 
family is associated with a given disease, maybe the basis for susceptibility to or development of the 
disease. 

Arthritis means "inflammation of a joint" and encompasses more than a hundred diseases. 
They can affect the joints and other connective tissues such as muscles, tendons, ligaments and 
protective coverings of internal organs. The major arthritis diseases are as follows: 

1, osteoarthritis - non-inflammatory degenerative joint disease characterised by splitting and 
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fragmentation of the articular cartilage, hypertrophy of the bone and changes in the synovial 
membrane. 

2. rheumatoid arthritis - chronic systemic, relapsing disease primarily of the joints which is marked 
by inflammatory changes in the synovial membranes and adjacent stmctures. 
5 3. .ankylosing spondylitis - inflammatory disease that affects the joints of the lower back which 

may lead to fusion of the spine 

4. gout - caused by formation of uric acid crystals in the joint, leading to inflammation and severe 
pain. 

Osteoarthritis is the most common type of arthritis. It differs from dieumatoid arthritis in that 
10 it is primarily a degeneration of the joint tissue that may be accompanied by an inflammatory reaction 
(Figure 1). Rheumatoid arthritis is an inflammatory disease first and foremost and inflammation of the 
synovium is the focal point of the disease. 

The initiation and progression of osteoarthritis involves multiple pathogenic mechanisms. An 
imbalance of chondrocyte-controlled anabolic and catabolic processes results in a progressive 
15 degradation of the components of the extracellular matrix of the articular cartilage, associated with 
, secondary inflammatory factors. The primary cause of this is unknown but possibly involves .a . • 

i deficiency of cellular^ response to normal tissue demand or insufficient cellular response to ^ 

. ■ supernormal deniand from mechanical loading or injury. The subsequent repair response could induce . . ■ 

elevated levels of anabolic molecules, leading to remodelling of the bone and production of osteophytes , 
20 (bone outgrowths) characteristic of the disease process. •; . i - . 

Prevalence and social cost of osteoarthritis. 

With approximately 40 million Americans affected by arthritis and other inflammatory 
25 diseases, the cost to the healthcare system is significant. Of these 40 million people, 21 million have 
osteoarthritis and 2.1 million have rheumatoid arthritis. Osteoarthritis is the most common chronic 
condition and cause of inactivity in patients older than 65. The disease occurs usually at the beginning 
of the fifth decade of life, with increasing prevalence and incidence with advancing age (Table 2). 
The prevalence of arthritis is expected to increase by 57% by the year 2020. In the same time period, 
30 arthritis-causing activity limitation win increase 66% to 11.6 million people (Lawrence et al 199 8). 
The primary impact of arthritis in the elderly is decreased physical functioning. This can be due to 
other health-related problems, such as weight gain, cardiovascular disease, GI distress related to 
treatment, increased psychological distress, decreased social functioning, increased work disability, and 

2 
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increased healthcare utilization. The current OA treatment, NSAIDs are responsible for the highest 
number of hospitahsations of any drug category and cause a significant number of internal 
gastrointestinal bleeding in fhe elderly population. 

The cost of arthritis in the US (including rheumatoid arthritis, osteoarthritis and all otlier 
5 rheumatic conditions) was shown to be $64.8 billion in 1992. Of this, direct costs were an estimated 
$15.2 billion and indirect costs $49.6 billion (Yelin and Callahan 1995). A 1997 study showed the cost 
of care for osteoarthritis $543 per patient per year (Lanes et al 1997). The largest component was 
hospital care, mostly due to admissions for hip or knee replacenaent. The cost to the healthcare 
provider is very high due to the prevalence of the iDness. 

10 Unmet medical needs for OA 

Current treatment options for osteoarthritis focus on symptom rehef whereas truly disease- 
modifying agents or methods are lacking. Thus, the basic therapy includes common analgesics, 
nonsteroidal anti-inflammatory drugs, physical therapy, walking aids, and eventually in severe cases, 
jomt replacement surgery. Perhaps because of the difficulties involved in measuring disease 

15 progression existing medications do not address the need to prevent further cartilage degradation. 

' i ' .iTo^develop such drugs the following: should be in place: 

f ^ Compounds that target appropriate biochemical patiiways (e.g. Merk's MMP-3 antagonist) ^ 
20 - Clinical studies must be able to measure disease progression in a cost-effective and safe ' 

fashion. This could be either an imaging technique or a biomarker that closely correlates with disease 
progression. 

Disease progression should be detectable wifhin a reasonable time scale (for example, anti- 
inflammatory chnical studies use the WOMAC pam scale for a period of 6 weeks to measure 
25 improvement due to medication). 

The efficacy of the new drug under development should be observable (using either the 
imaging or biomarker method of assessment) in a sample size comparable to that of other clinical 
trials. 

30 How can genetics help? Genetic studies have the potential to detect: 

Novel dmg targets in tiie appropriate pathways. 
Individuals with fast progressing osteoarthritis. This would allow a 
pharmaceutical company to prove efficacy in a relatively small sample size and in a reasonable period 

3 ■ 
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of time, lims cutting costs. 

Reduce variation from biomarker or imaging patterns. For example, let's 
assume the following response to medication. Although there is a clear patterns of response to 
medication, it is not statistically significant because of the large amount of variation in disease 
5 progression. Lets now assume that there exists a genetic marker that is able to stratify the 

measurement of disease progression in this hypothetical study. The variance of the marker of disease 
progression associated with each genotype is smaller than the overall variance. This can be seen as 
analogous to stratifying a relevant clinical measure in a study (e.g. lipid levels) by gender or by age 
group. By pooling.together both genders or bolii age groups the variance is larger. If we were now to 
10 stratify the results of the previous hypothetical study by genotype we migjit observe that the 

therapeutic efficacy is now statistically significant. By stratifying according to genotype it could then 
be possible to detect statistically significant efficacy in both groups, while meeting the cost and tune 
needs of the entity developing the drug. 

15 Genetic study of osteoarthritis. 

. Evidence for genetic predisposition to OA. . .)..'...>..' 

The nature of the genetic influence in osteoarthritis may involve either a structural defect i(that 
. '. is, Gbllaigen),' alterations in cartilage or bone metabolism, or a genetic influence on a known risk factor . 

20 for osteoarthritis such as obesity. Twin studies have show that between 39% and 65% of ^ 

osteoarthritis in the general population can be attributed to genetic factors (MacGregor and Spector, 
1999). Linkage analyses (i.e., common inheritance of affected individuals in the same family) have 
identified a higher risk ratio for relatives of affected individuals compared to the general population. 
The power to detect disease-susceptibility loci through linkage analysis using pairs of affected relatives 

25 depends on 1r, the risk ratio for type R relatives compared with population prevalence (Risch 1990). 
KeUgten et aL (1963) compared expected and observed incidence of osteoarthritis in first-degree 
relatives of probands with multiple osteoarlbritis. Based on their results we have estimated 1^ for nodal 
and non-nodal osteoarthritis. 



J^odal (presence of Heberdeen's nodes) 4.5 
j^on-nodal ' 4.75 
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For comparison, concordance for type 2 diabetes ranges between 2-3, and between 4.5 and 
5.5 for rheumatoid arthritis. These figures indicate a high genetic component to OA If, however, non- 
nodal and nodal types of OA are mixed together 1r drops to ~ 2.0 highlighting the importance of 
careful clinical characterization for genetic studies. 
5 AMiough it is known that tliere is a genetic component involved in the etiology of osteoarthritis 

• there is also a need in the ait for an improved understanding of the genetic causes of osteoarthritis. 

There is also a need in the art for identification of the genes associated with osteoarthritis, and 
identification of sequence variations in these genes that are associated with osteoarthritis and bone 
remodeling. The identification of disease related sequence variations in osteoarthritis and bone 
10 remodeling associated genes will allow for the development of improved methods of screening for 
osteoarthritis. These improved screening protocols may be used to identify individuals at high risk for 
osteoarthritis and in need of preventative treatments. 

The identification of disease related sequence variations in osteoarthritis associated genes may 
facilitate the design of treatment protocols and the identification and design of compoimds useful for 
15 treatment of osteozulhritis and bone remodeling. 

OBJECTS AND SUMMARY OF THE INVENTION 
:i An object of tlie present invention is to.provide candidate genes associated with osteoarthritis;, 
and bone ae&doodeling. ; • ..v-^^'h: 

20 It is another object of the present invention to provide a variant nucleotide in a candidate'gene 

associated with osteoarthritis and bone remodeling. 

Another object of the present invention is to provide methods of detecting variant nucleotides 
in a gene in individuals at risk for osteoarthritis. 

Another object of the present invention is to provide methods of deterrnining if a variant 
25 nucleotide is associated with a predisposition to osteoarthritis. 

Another object of the present invention is to provide candidate genes associated with ihe 
osteoarthritis and bone remodeling. 

The invention further comprises isolated polynucleotides which contain the single nucleotide 
polymorphisms selected from the Sequence Listing, or its perfect complement. 
30 The invention further comprises an isolated polynucleotide segment of between 10 and 100 

bases of which 10 contiguous bases including a polymorphic site are from a sequence selected from 
the Sequence Listing, or its perfect complement. 

The invention further comprises a probe or target sequence used for genotyping where the 

■ 5 
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probe or target sequence has at least 10 contiguous bases containing a polymorphic site identified and 
from a sequence selected firom the Sequence Listing, or its perfect complement. 

The invention further comprises a method for determining a base occupying a polymorphic site 
ia a nucleic acid comprising obtaining the nucleic acid in a sample from an individual or plurality of 
individuals and determining a base occupying a polymorphic site in a sequence selected firom the group 
consisting of the Sequence Listing and their perfect complements which occurs m the sample nucleic 
acid. 

DESCRIPTION OF THE COMPACT DISK-RECORDABLES (CD-R) 

CD-R (Copy l)contains the Sequence Listing formatted in plain ASCII text and Tabl^ 1 and 
2. CD-R (Copy 1) is labeled with Identification No. GX-0022P-1. 

CD-R (Copy 2) is an exact copy of CD-R (Copy 1). CD-R (Copy 2) is labeled with 
Identification No. GX-00224 P (Copy 2). 

CD-R (Copy 3) contains tihe Computer Readable Form of the Sequence Listing in compliance 
with 37 C.F.R. §1.821(e), and specified by 37 C.F.R §1.824. CD-R (Copy 3) is labeled with 
Identification No. GX-0022-1 P (Copy 3). 

The material on CD-R 1, 2 and 3 is incorporated by reference into the specification. 

BRIEF DESCRIPTION OF THE TABLES AND DRAWINGS 
These and other features, aspects, and advantages of the present invention will become better 
understood with regard to the following description, appended claims, and accompanymg tables 

drawings where: 

Table 1 presents the genomic or cDNA stmcture of osteoarthritis candidate gene sequences 
and the identity and position of polymorphisms which are.tiie subject of tiie invention. This table has 
; the form wherein: 

a. The DNA change given for an allele is not strand specific; it can 
be on eitiier strand of the DNA molecule. 

b. Single Nucleotide Polymorphisms can be recorded as lUPAC ambiguity 
symbols, as follows: 

3 M AorC 

R A or G 
S CorG 
K GorT 
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W AorT 
Y CorT 

c. Other allele types, such as iBsertipns and deletions, are given in the form: ACA>AA 
or AA>ACA and in snch cases the coordinates of the allele include the two invariant 

5 flanking bases. 

d. DNA sequence names are of the form: XX:ininLVV], where XX gives the database of 
origin, as follows: 

EM EMBL 

FN Ihcyte FL sequence read 
10 GB GenBank 

IN Incyte proprietary sequence 
LG LifeSeq Gold gene template 



mmi gives the sequence ID or accession number for the sequence. In most cases if it is an 
15 accession mmiber it will be followed by _W where VV is the sequence version in the EMBL or 
GenBaiik database. 

: -e. TheoveraDstnicture of airecord in'-the.patentstnictureissdescribed asfoUows. Itemsin . 
{braces } indicate a field that is filled in. Items in [square brackets] may or may not be present. 
These entries define a larger virtual sequence?- a : "liok" com 
20 Alleles are annotated onto real sequences, and genomic structure onto the link. 
{Locus ID} 

[Full name : {fall name}] 
. link : {link name} 

Subsequence {name} {link start position} {link stop position} {SEQIDNO} 

25 [...] 

CDS {name} {SEQIDNO} 

exon/OKF {link start position} {link stop position} 

[...] 
[...] 

30 Allele {seq name} {SEQ ID NO} {seq start} {seq stop} {dna change} 

source {original SNP data source} {SNP id in that source} 
[...] 

coi^equence {CDS name} {CDS SEQIDNO} {class} [{peptide pos} {peptide 
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change}] 

[...] 

[...] 

f. Sources. SNPs may have been noted in one of several sources: 
5 dbSNP The NCBIpubKcdbSNP databank 

isSNP In silico SNPs from LifeSeq sequence assembly. 
wetSNP Alleles determined by SSCP. 
Alleles which have a wetSNP entry are experimentally verified. Alleles which are isSNP 
and/or dbSNP only are predictions by computer software of where these SNPs map to, and are *not* 
10 e3q)erimentally verified. 

g. Consequences 

The classes of consequence are as follows: 

Silent The allele does not cause a peptide change 
Missense The allele causes an amino acid substitution 
15 Frameshift The allele causes a frame shift itt the CDS 

Intron The aUele lies whofly within an iotron. 
5' The:aIlelelies5':ofthe.GDS' . ■ - .. 

3' The' allele hes 3 ' of the CDS 
;j , . Unknown The consequence is undefined > for example the allele straddles ; 

20 an iatron/exon boundary. 

Silent and Missense consequences also supply details of the ardno acid position of the change, 
and prediction of what the affected amino acid is, and what it is substituted to. There may be multiple 
consequence lines if the locus contains multiple CDS forms. 

h. Sequence and exon positions 

25 Sequence coordinates are always given on the forward strand of the link. Therefore, if a 

sequence or exon is actuaUy on the reverse strand of the liiik, its start position wiU be larger than its 
stop position. 

i. Exon order in CDS definitions 

The exons are given in 5' to 3' ordesr. Consequently, reverse strand CDS start from high 
30 coorduiate numbers downwards, 
j. Link object types 

Loci may have more than one link object, composed of different DNA sequences. Typically there 
might be one genomic and one cDNA link object. 
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Table 2 presents the population frequency of polymorphisms in the candidate genes and 
summarizes various information from Table 2 relating to. the polymorphism. 

Figure 1 illustrates the cDNA structure of the locus and relative positions of identified SNPs 
for megakaryocyte stimulating factor (MSF). 
5 Figure 2 illustrates tbe genomic structure of the locus, exons composing multiple CDS, and 

relative positions of identified SNPs for megakaryocyte stimulating factor (MSF). 

The figures show (from left to right) the real sequences making up the linked genomic 
structure for the locus, a scale in link coordinates (negative numbers would indicate a view of the 
reverse strand), one or more CDSs representing the positions of exons, horizontal bars representing 
10 the positions of identified SNPs (alleles) from the various sources, and shaded boxes showing regions 
targeted for screening by SSCP. 

DESCRIPTION OF THE PREFERRED EMBODIMENTS 

15 . • ' 

Before the present compositions and methods are described, it is understood that embodiments 
of the invention are not limited to the particular machines, instruments, materials, and methods 
described, as these may vary. It is also to be understood that the terminology used herein is for the 
purpose of describing particular embodiments only, and is not intended to limit the scope of the 

20 invention. 

As used herein and in the appended claims, the singular forms "a," "an," and *fhe" include 
plural reference unless the context clearly dictates otherwise. Thus, for example, a reference to "a 
nucleic acid probe" includes a plurality of such nucleic acid probes, and a reference to "a gene" is a 
reference to one or more genes and equivalents thereof known to those skilled in the art, and so forth. 

25 Unless defined otiierwise, all technical and scientific terms used herein have the same 

meanings as commonly tmderstood by one of ordinary skill in the art to which this invention belongs. 
Although any machines, materials, and methods similar or equivalent to those described herein can be 
used to practice or test the present invention, the preferred machines, materials and methods are now 
described. AH publications mentioned herein are cited for the purpose of describing and disclosing the 

30 cell lines, protocols, reagents and vectors which are reported in the publications and which might be 
used in connection with various embodiments of the invention. Nothing herein is to be construed as an 
admission that the invention is not.entitled to antedate such disclosure by virtue of prior invention. 
Definitions 

9 . 
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As used herein, "polymorphism" refers to a nucleotide alteration that either predisposes an 
individual to a disease or is not associated with a disease, which occurs as a result of a substitution, 
insertion or deletion. 

More particularly, a "polymorphism" or "polymorphic variation" may be a nucleic acid 
5 sequence variation, as compared to the naturally occurring sequence, resulting from either a nucleotide 
deletion, aa insertion or addition, or a substitution, which is present at a frequency of greater than 1% 
in a population. 

As used herein, "neutral polymorphism" refers to a polymorphism which is present at a 
frequency of greater than 1% in a population, which does not alter gene function or phenotype, and 
10 thus is not associated with a predisposition to or development of a disease. 

As used herein "polynucleotide sequence" refers to a sense or antisense nucleic acid 
sequence comprising RNA, cDNA, genomic DNA, synthetic forms and mixed polymers, that maybe 
chemically or biochemically modified or may contain non-natural or derivatized nucleotide bases. 
As used hereiu "mutation" refers to a variation m flie nucleotide sequence of a gene or 
15 regulatory sequence as compared to the naturally occurring or normal nucleotide sequence. A 

mutation may result from the deletion, insertion or.substitution of more than one nucleotide (e.g., 2, 3,, 
.4; or more nucleotides) or a single nucleotide change such as a deletion, insertion or substitution. The/y 
term "mutation" also encompasses chromosomal rearrangements. 

. As used herein, "nucleic/acid probe" refers to an oligonucleotidei nucleotide or polynucleotide, 
20 and fragments and portions fliereof , and to DNA or RNA of genomic or synthetic origin which may be 
single- or double- stranded, which represents the sense or antisense strand. Both terms "nucleic acid 
probe" and "DNA fragment" refer to a length of polynucleotide, for example, as small as 5 
nucleotides, 10, 20, 25, 40, 50, 75, 100, 250, 400, 500 and 1 kb, and as large as 5-lOkb. 

As used herein, "alteration" refers to a change in either a nucleotide or amino acid sequence, 
25 as compared to the naturally occurring sequence, resulting from a deletion, an insertion or addition, or 
a substitution. 

As used herein, "deletion" refers to a change in either nucleotide or amino acid sequence 
wherein one or more nucleotides or amino acid residues, respectively, are absent. 

As used herein, "insertion" or "addition" refers to a change in either nucleotide or amino acid 
30 sequence wherein one or more nucleotides or amino acid residues, respectively, have been added. 

As used herein, "substitution" refers to a replacement of one or more nucleotides or amino 
acids by different nucleotides or amino acid residues, respectively. 

As used herein, "specificaUy hybridizable" refers to a nucleic acid or fragment thereof that 

10 
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hybridizes to anoUier nucleic acid (or a complementary strand thereof) due to the presence of a region 
that is at least approximately 90% homologons, preferably at least approximately 90-95% homologous, 
and more preferably approximately 98-100% homologous, as are polynucleotides that hybridize to a 
paitner under stringent hybridization conditions. "Stringent" hybridization conditions are defined 
5 hereinbelow for various hybridization protocols. A probe that is specifically hybridizable to a given 
sequence can be used to detect a 1 bp out of 10 bp (10%) or a 1 bp out of 20 bp (5 %) difference 
between nucleic acid sequences and is therefore useful for discriminating between a wild type and a 
mutant form of a gene of interest. 

As used herein, "amino acid sequence" refers to the sequential array of amino acids that have 
10 been joiaed by peptide bonds between flie carboxylic acid group of one amino acid and the amino 
group of the adjacent amino acid to form long linear polymers comprising proteins. 

As used herein, "amino acid" refers to protein subunit molecules that contain a carboxylic acid 
group, and an amino group, both linked to a single carbon atom. 

A polypeptide is said to be "encoded" by a polynucleotide if the polynucleotide, either in its 
15 native state or in a recombinant form can be transcribed and/or translated to produce the mENA for 
and/or, the polypeptide or a fragment thereof. '■■ 

• As used herein, "gene " refers to a region of DNA. which includes a portion which can be , . 
transcribed into RNA, and which may contain an open reading frame, or coding region (also referred >■ 
to as- an exon) which encodes a protein, a non-coding region (also referred to as an infaron), and a 
20 specific regulatory region comprising the DNA regulatory elements which control expression of the 
transcribed region. 

As used herein, "coding region" refers to a region of DNA which encodes a protein, also 
known as an exon. 

As used herein, "non-coding region" refers to a region of DNA which does not encode a 
25 . protein coding region, also known as an intron, and is not included in the RNA molecule that is 
synthesized firom a particular gene. 

As used herein, "regulatory region" refers to DNA sequences which are located either 5* of 
the transcription start site, 3' or the transcription termination site, within an intron or exon, capable of 
ensuring that the gene is transcribed at the proper time and in the appropriate cell type. 
30 As used herein, "consensus DNA sequence" or "wild-type DNA sequence" refers to a 

sequence wherein every position represents the nucleotide that occurs with tiie highest frequency 
when many actual sequences are compared. As used herein, "consensus DNA sequence" or "wild-- 
type'DNA sequence" also refers to the normal, naturally occurring DNA sequence. 

11 
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As used herein, a given sequence (or mutation or polymoi-phism) "associated with" 
osteoarthritis refers to a nucleic acid sequence that increases susceptibility to the disease, predisposes 
an individual to the disease or contiributes to the disease, wherein the nucleic acid sequence is present 
at a higher frequency (at least 5%, preferably 10%, more preferably 25% higher) in individuals with 
the disease as compared to individuals who do not have the disease. 

As used herein, a sequence "not associated with" osteoarthritis refers to a nucleic acid 
sequence that does not increase susceptibility to the disease, predispose an individual to the disease or 
contribute to the disease, wherein the nucleic acid sequence is not present at a higher frequency in 
individuals with the disease, and thus is present at a frequency about equal to its frequency in 
individuals who do not have the disease. 

As used herein, "ampli^g" refers to producing additional copies of a nucleic acid sequence, 
preferably by the method of polymerase chain reaction (MuUis and Faloona, 1987, Methods Enzymol, 
155:335). 

As used herein, "oligonucleotide primers" refer to single stranded DNA or RNA molecules 
fliat are hybridizable to a nucleic acid template and prime enzymatic synthesis of a second nucleic acid 
strand. OHgonucleotide primers useful accordmg to the invention are between 5 to 100 nucleotides in . 
length, preferably 20-60 nucleotides in length, and more preferably 20-40 nucleotides in length. ; 

As used herein, "sequencing" refers to determining the precise nucleotide composition or 
sequence of a nucleic acid region by methods weU known in the art (see Ausubel et al. , supra and 
Sambrook et al., supra). 

As used herein, "comparing" a sequence refers to determining if the nucleotides at one or 
more positions ia a particular region of a nucleic acid fragment are identical for any two or more 
sequences. According to the iavention, sequence comparisons can be performed by using computer 
program analysis as described below in Section F entitled 'Identification and Characterization of 
i Polymorphisms". 

As used herein, "sequence differences" or "sequence variations" refer to nucleotide changes, 
at one or more positions between any two or more sequences being compared. 

As used herein, "determining tiie presence of polymorphic variations" refers to using methods 
well known in tiie art to identify a nucleotide, at one or more positions within a particular nucleic acid 
D region, that is distinct from the nucleotide present in the natarally occurring, wild-type or consensus 
sequence, resulting from either a nucleotide deletion, an msertion or addition, or a substitotion. 

As used herein, "determinmg the absence of polymorphic variations" refers to using meliiods 
well known in flie art to determine that the nucleotides present at every position analyzed in a 
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particular nucleic acid region are identical to the nucleotides present in the naturally occurring, wild- 
type or consensus sequence. 

As used herein, "genotypiog" refers to determining the composition of the genetic material 
that is inherited by an organism from its parents. 
5 As used herein, "biological sample" refers to a tissue or fluid sample containing a 

polynucleotide or polypeptide of interest, and isolated from an individual including but not limited to 
plasma, serum, spinal fluid, lymph fluid, urine, stool, external secretions of the sldn, respiratory, 
intestinal and genitoruinary tracts, saliva, blood cells, tumors, organs, tissue and samples of in vitro cell 
culture constituents. 

10 As used herein, "amplitners" refer to a specific fragment of DNA generated by PGR that is 

at least 30 bp in length and is preferably between 50 and lOObp in length, and is more preferably 
between 150-300bp in length, with a melting temperature in the range of approxunately 60-62°C. 

As used herein, "phenotype" refers to the biological appearances of an organism or a tissue 
derived fi-om an organism, wherein biological appearances include chemical, structural and behavioral 
15 attributes, and excludes genetic constitution. 

As used herein, "genotype" refers to the genetic material that is inherited by an organism from 
its parents. . . ... • 

As used herein, "genetic susceptibility to osteoarthritis" refers to an increased risk of i 
developing osteoarthritis resulting from specific DNA differences relative to non-susceptible j;. 
20 individuals. Preferably an individual who is genetically susceptible to osteoarthritis has a 5-100%, and 
more preferably a 25-50% greater chance of developing osteoarthritis, as compared to non- 
susceptible individuals. 

As used herein, "diagnostic" refers to the practice of identifying a disease from the signs and 
symptoms of an individual including the DNA sequences of genes that are associated with an 
25 incre^ed susceptibility to the disease. "Diagnostic" also refers to the practice of stratifying patient 
populations based on the efficacy or toxicity of a composition, and the predictive placement of an 
individual in a response stiata based on stata-associated parameters. 

As used herein, "prognosis" refers to the possibility of recovering from a particular disease or 
condition, and also refers to risk assessment of developing a particular disease or condition. 

30 

THE INVENTION 

Various embodiments of the inv^tion include polynucleotides and polymorphic poljnmcleotides 
associated with a given human disease, for example, with osteoarthritis. The invention also provides a 
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gene sequence containing one or more polymorphic nucleotides associated with a predisposition to or 
die development of a given human disease such as osteoarthritis. The invention also relates to 
polypeptides encoded by flie polynucleotides or the polymorphism-containing gene. The invention also 
provides methods of detecting a polymorphism according to the invention in individuals at risk for 
' 5 osteoartliritis, and for determining if a given polymorphism is associated with a predisposition to the 
disease. The invention also discloses polymorphism(s) that are either associated with or ai-e not 
associated with (i.e., are neutral) osteoarthritis. A polymorphism in a given gene can be utilized in 
various diagnostic and therapeutic methods and procedures, for example, in nucleic acid and peptide 
diagnosis, dmg screening and design, and in gene and peptide therapy. A polymorphism associated 
10 with a given gene can be utilized in various gene e3q>ression systems and assays designed to analyze 
gene regulation and expression. 

A. Design and Synthesis of Oligonucleotide Primers 

According to the present invention, oligonucleotide primers are disclosed that are useful for 
15 determiomg the sequence of a particular allele of a gene. The invention also discloses oligonucleotide 
primers designed to amplify a region of a gene that is known to contain a polymorphism. The invention 
also discloses oligonucleotideprimers designed to anneal specifically to a particular allele of a gene. 

Oligonucleotide primers useful according to the invention are single-stranded DNA or RNA 
molecules that are hybridizable to a nucleic acid template and prime enzymatic synthesis of a second : ; 
20 nucleic add strand. The primer is complementary to a portion of a target molecule present in a pool of 
nucleic acid molecules. It is contemplated that oligonucleotide primers according to the invention are 
prepared by synlhetic methods, either chemical or enzymatic. Alternatively, such a molecule or a 
fragment thereof is naturaUy-occurring, and is isolated from its natural source or purchased from a 
commercial supplier. Oligonucleotide primers are 5 to 100 nucleotides in length, ideaUy from 20 to 40 
25 nucleotides, aHhough oligonucleotides of different length are of use. 

Pairs of single-stranded DNA prirners can be annealed to sequences within or surrounding a 
gene on chromosome Y in order to prime amplifying DNA synthesis of a region of a gene. A 
complete set of gene primers wiU allow synthesis of aU of the nucleotides of the coding sequences, 
e.g., tiie exons, intirons and conti-ol regions. Preferably, the set of primers will also aUow synthesis of 
30 both intron and exon sequences. 

Allele-specific primers are also useful, according to the invention. Such primers will anneal 
only to a particular-mutant allele (e.g. alleles containing a polymorphism), and thus wffl only amplify a 
product if the template also contains the polymorphism. Allele specific primers that anneal only to a 
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wild type gene sequence are also useful according to the invention. 

Typically, selective hybridization occurs when two nucleic acid sequences are substaulially 
complementary (at least about 65% complementary over a stretch of at least 14 to 25 nucleotides, 
preferably at least about 75%, more preferably at least about 90% complementary). See Kanehisa, 

5 M., 1984, Nucleic Acids Res . 12: 203, incorporated hereiuby reference. As a result, it is expected that 
a Certaiu degree of mismatch at tlie priming site is tolerated. Such mismatch may be small, such as a 
mono-, di- or tri-nucleotide. Alternatively, it may encompass loops, which are defined as regions in 
which there exists a mismatch in an uninterrupted series of four or more nucleotides. 

Numerous factors influence the efficiency and selectivity of hybridization of the primer to a 

10 second nucleic acid molecule. These factors, which include primer length, nucleotide sequence and/or 
composition, hybridization temperature, buffer composition and potential for steric hindrance in the 
region to which the primer is required to hybridize, will be considered when designing oligonucleotide 
primers according to the invention. 

A positive correlation exists between primer length and both the efficiency and accuracy with 

15 which a primer will auneal to a target sequence. In particular, longer sequences have a higher melting 
temperature (Ti^ than do shorter ones, and are less likely to be repeated within a given target 
sequence, thereby nainimizing promiscuous hybridization. Primer sequences with a high G-C content^or 
that comprise palindromic sequences tend to self-hybridize, as do their intended target sites, since v 
unimolecular, rather than bimolecular, hybridization kinetics are generally favored in solution.. 

20 However, it is also important to design a primer that contains sufficient numbers of G-C nucleotide 
pairings since each G-C pair is bound by three hydrogen bonds, rather flian the two that are found 
when A and T bases pair to bind the target sequence, and therefore forms a tighter, stronger bond. 
Hybridization temperature varies inversely with primer annealing efficiency, as does the concentration 
of organic solvents, e.g. formamide, that might be included in a priming reaction or hybridization 

25 mixture, while increases in salt concentration facilitate binding. Under stringent annealing conditions, 
longer hybridization probes (of use, for example, in Northern analysis), or synthesis primers, hybridize 
more efficiently than do shorter ones, which are sufficient under more permissive conditions. Stringent 
hybridization conditions typically mclude salt concentrations of less than about IM, more usually less 
than about 500 mM and preferably less than about 200 mM. Hybridization temperatures range from as 

30 low as 0°C to greater than 22°C, greater than about 30°C, and (most often) in excess of about 37°C. 
Longer fragments may require higher hybridization temperatures for specific hybridization. As several 
fectors affect the stringency of hybridization, the combination of parameters is more important than 
the absolute iheasure of a single factor. 

15 . 
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Oligonucleotide primers can be designed with these considerations in mind and synthesized 
according to the following methods. 

1. Oligonucleotide Primer Design Strategy 
5 The design of a particular oligonucleotide primer for the purpose of sequencing or PGR 

involves selecting a sequence that is capable of recognizing the target sequence, but has a minimal 
predicted secondary structure. The oligonucleotide sequence binds only to a single site in the target 
nucleic acid. Furthermore, the Tm of the oligonucleotide is optimized by analysis of the length and GC 
content of liie oligonucleotide. Furthermore, when designing a PGR primer useful for the amplification 
10 of genomic DNA, the selected primer sequence does not demonstrate significant matches to 
sequences in the GenBank database (or other available databases). 

The design of a primer is facilitated by the use of readily available computer programs, 
developed to assist in the evaluation of the several parameters described above and the optimization of 
primer sequences. Examples of such programs are ^TrimerSelect" of flie DNAStar™ software 
15 package (DNAStar, Inc.; Madison, WT), OLIGO 4.0 QSTational Biosciences, Mc), PRIMER, 

Oligonucleotide Selection Program, PGEN and Amplify (described in Ausubel et al. , 1 995 , Short 
Protocols in Molecular Biology , 3rd Edition, John Wiley & Sons). Primers are designed with "-' 
sequences that serve as targets for other primers to produce a PGR product that has known 
. sequences on the ends which serve as targets for further amplification (e.g. to sequence the. PGR < 
20 product). If many different genes are amplified with specific primers liiat share a corimion 'tail' 

sequence' , the PGR products fix>m these distinct genes can subsequently be sequenced with a single 
set of primers. Alternatively, in order to facilitate subsequent cloning of amplified sequences, primers 
are designed with restriction enzyme site sequences appended to their 5' ends. Tlius, aU nucleotides of 
the primers are derived firom gene sequences or sequences adjacent to a gene, except for the few 
25 nucleotides necessary to form a restriction enzyme site. Such enzymes and sites are well known in ibe 
art. If the genomic sequence of a gene and the sequence of the open reading frame of a gene are 
known, design of particular primers is well within the skill of the art 
2. Synthesis 

The primers themselves are synthesized using techniques which are also well known in the 
30 art. Once designed, oligonucleotides are prepared by a suitable method, e.g. the phosphoramidite 
method described by Beaucage and Garrathers (1981, Tetrahedron Lett.. 22:1859) or the triester 
method according to Matteucci et al. (1981, T. Am Ghem. Soc, 103:3185), both incorporated herein 
by reference, or by other chemical methods using either a commercial automated oligonucleotide 
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synthesizer (which is commercially available) or VLSIPS^ technology. 

B. Production of a Polynucleotide Sequence 

The invention discloses polynucleotide sequences comprising polymorphisms. Tlie 

5 polynucleotide sequences of the invention are specifically hybridizable to a mutant form of a gene and 
are therefore useful for disciiminating between a wild-type form of a gene and a mutant form of a 
gene. The polynucleotide sequences of the invention may also be useful for expression of the encoded 
protein or a fragment thereof. The invention also features antisense polynucleotide sequences 
complementary to polynucleotide sequences comprising polymoq)hisms. Antisense polynucleotide 

10 sequences are useful according to the invention for inhibiting expression of an allelic form of a gene. 

The present invention utilizes polynucleotide sequences and fragments comprising RNA, 
cDNA, genomic DNA, synthetic forms, and mixed polymers. The invention includes both sense and 
antisense strands of the polynucleotide sequences. According to the iavention, the polynucleotide 
sequences maybe chemically or biochemically modified or may contain non-natural or derivatized 

15 nucleotide bases. Such modifications include, for example, labek, meihylation, substitution of one or 
more of the naturally occurring nucleotides with an analog, intemucleotide modifications, such as 
uncharged linkages (e.g. methyl phosphonates,- phosphorodithioates. etc.), pendent moieties (e.g., 
polypeptides), intercalators, (e.g. acridine, psoralen, etc.) chelators, aUcylators, and modified linkages 
(e.g. alpha anomeric nucleic acids, etc.) Also included are synthetic molecules lhat mimic 

20 polynucleotides in their ability to bind to a designated sequence via hydrogen bonding and other 

chemical interactions. Such molecules are known in the art and include, for example, those in which 
peptide hnkages substitute for phosphate linkages in the backbone of the molecule. 

The polynucleotide may be a naturally occurring polynucleotide, or may be a structurally 
related variant of such a polynucleotide having modified bases and/or sugars and/or linkages. The 

25 term "polynucleotide" as used herein is intended to cover all such variants. 

Modifications, which maybe made to the polynucleotide may include (but are not limited to) 
the following types: 
a) Backbone modifications 

i) phosphorothioates (X or Y or W or Z = S or any combination of two or more with the 
30 remainder as 0). 

e.g. Y=S (Stein et aL, 1988, Nucleic Acids Res. , 15:3209), X=S (Cosstick and Vyle, 1989, 
Tetrahedron Letters . 30:4693), Y and Z=S (Brill et aL, 1989, J. Amcr. Chem. Soc . 111:2321) 

ii) methylphosphonates (eg Z=methyl (Miller et al., 1980, J. Biol. Chem. , 255:9569)) 

17 
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iii) phosphoramidates (Z = N-(aIkyl)2e.g. alkyl methyl, ethyl, butyl) (Z=morpholme or 
piperazine) (Agrawal et al., 1988, Proc. Natl. Acad. Sci.. USA , 85;7079) (X or W = NH) (Mag and 
Engels. 1988. Nucleic Acids Res. . 16:3525) 

iv) phosphotriesters (Z=0-alkyl e.g. methyl, ethyl etc) (Miller et al., Biochemistry, 21:5468) 
5 v) phosphorus-firee linkages (e.g. carbamate, acetamidate, acetate) (Gait et aL, 1974, J 

Chem.Soc. Perkin I , 1684, Gait et al., 1979, T Chem.Soc. Perkinl . 1389) 

b) Sugar modifications 

i) 2'-deoxynticleosides (R=H) 

ii) 2' -O-meHiylated nucleosides (R=OMe) (Sproat et aL , 19 89 , Nucleic Acids Res.. 17 : 

10 3373) 

iii) 2'-fluoro-2'-deoxynucleosides (R=F) (Krag et al., 1989, Nucleosides ^ NucleotLdcs, 

8:1473) 

c) Base modifications - (for a review see Jones, 1979, Int. J. Biolo g. Macromolecules. 1:194) 

i) pyrimidine derivatives substituted in the 5-position (e.g. methyl, bromo, fhioro etc) or 
15 replacing a carbonyl group by an amino group (Hccirilli et aL, 1990, Nature . 343 :33). 

ii) purine derivatives lacking specific nitrogen atoms (e.g. 7-deaza adenine, hypoxanfhine) or 
fimctionalized in the 8~position (e.g. . 8-azido adenine, 8-bromo adenine) 

d) Polynucleotides covalentlv linked to reactive flunctional groups, e.g.: 

i) psoralens (Miller et al., 1988, Nucleic Acids Res. Special Pub . No. 20:1 13, phenanthrolines 
10 (Sun et aL,- 1988, Biochemistry . 27 :6039), mustards (Vlassov et al., 1988, Gene, 72:3 13) (irreversible 

cross-linkiDg agents with or without the need for co-reagents) 

ii) acridine (intercalating agents) (Helen© et aL, 1985, Biochimie, 67:777) 

iii) thiol derivatives (reversible disulphide formation with proteins) (Connony and Newman, 
1 9R9. Nucleic Acids Res. . 17:4957) 

25 iv) aldehydes (Schiffs base formation) 

v) azido, bromo groups (UV cross-linking) 

vi) eHipticines (photolytfc cross-linking) (Perrouault et aL, 1990, Nature. 344:358) 

e^ Polynucleotides covalentlv linked to lipophilic groups or oto gag^ cae^ of improving uptake 
by cells, e.g.: 

30 i) ^T.r.v.stp.rf>1 rr.p,t^tnger et. al.. 1989. Proc. Natl. Ac ad. ScL USA, 86:6553), polyamines 

(Lemaitre et al., 1987, Proc. NatL Acad. ScL USA . 84: 648), other soluble polymers (e.g. polyethylene 
glycol) 

f) PQlvnucleot ide.s containin g alpha-nucleosides (Morvan et aL, Nucleic Acids Res.. 15: 3421) 

18 
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Combinations of modifications aVf) 

It should be noted that such modified polynucleotides, while sharing features with 
polynucleotides designed as "anti-sense" inhibitors, are distinct ia that the compounds correspond to 
sense-strand sequences and the mecheinism of action depends on protein-nucleic acid interactions and 
5 does not depend upon interactions with nucleic acid sequences. 

1. Polynucleotide Sequences Comprising DNA 
a. Cloning 

Polynucleotide sequences comprising DNA can be isolated from cDNA or genomic libraries 
10 (including YAC and B AC libraries) by cloning methods well known to those sMlled in the art (Ausubel 
et al., supra). Briefly, isolation of a DNA clone comprising a particular polynucleotide sequence 
involves screening a recombinant DNA or cDNA library and identifying the clone containing the 
desired sequence. Cloning will involve the following steps. The clones of a particular library are spread 
onto plates, transferred to an appropriate substrate for screening, denatured, and probed for the 
15 presence of a particular sequence. A description of hybridization conditions, and methods for 
producing labeled probes is included below. 

The desired clone is preferably identified by hybridization to a nucleic acid probe or by 
expression of a protein that can be detected by an antibody. Alternatively, the desired clone is 
identified by polymerase chain amplification of a sequence defined by a particular set of primers 
20 according to the methods described below. 

The selection of an appropriate library involves identifying tissues or cell lines that are an 
abundant source of the desired sequence. Furthermore, if the polynucleotide sequence of interest 
contains regulatory sequence or intronic sequence a genonoic library is screened (Ausubel et al., 
supra). 

25 b. Genomic DNA 

Polynucleotide sequences of the invention are amplified fiom genomic DNA. Genomic DNA 
is isolated from tissues or cells according to the following method. 

To facilitate detection of a variant form of a gene ftom a particular tissue, the tissue is isolated 
free from surrounding normal tissues. To isolate genomic DNA from mammalian tissue, the tissue is 
30 minced and frozen in liquid nitrogen. Frozen tissue is ground into a fine powder with a prechiUed i 
mortar and pestle, and suspended in digestion buffer (100 mM NaCl, 10 mM TrisCl, pH 8.0, 25 mM 
EDTA, pH 8.0, 0.5% (w/v) SDS, 0.1 mg/ml proteinase K) at 1.2ml digestion buffer per lOOmg of 
tissue. To isolate genomic DNA from mammalian tissue culture cells, cells are pelleted by 
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centrifdgation for 5 min at 500 x g, resuspended in 1-10 ml ice-cold PBS, repelleted for 5 min at 500 x 
g and resuspended in 1 volume of digestion buffer. 

Samples in digestion buffer are incubated (with shaking) for 12-1 8 hours at 50°C, and then 
extracted with an equal volume of phenol/chloroform/isoamyl alcohol. If the phases are not resolved 

5 following a centrifugation step (10 min at 1700 x g), another volume of digestion buffer (without 
proteinase K) is added and the centrifugation step is repeated. If a thick white material is evident at 
the interface of the t^o phases, the organic extraction step is repeated. Following extraction the upper, 
aqueous layer is transferred to a new tube to which wiU be added 112 volume of 7.5M ammomum 
acetate and 2 volumes of 100% efhanoL The nucleic acid is pelleted by centrifugation for 2 min at 

10 1700 X g, washed with 70% ethanol, air dried and resuspended in TE buffer (10 mM TrisCl, pH 8.0, 1 
mM EDTA, pH 8.0) at hng/ml. Residual RNA is removed by iacubating the sample for 1 hour at 37°C 
in the presence of 0.1% SDS and 1 mg/ml DNAse-free RNASE, and repeating the extraction and 
ethanol precipitation steps. The yield of genomic DNA, according to this method is expected to be 
approximately 2 mg DNA/1 g cells or tissue (Ausubel et al., supra). Genomic DNA isolated 

15 according to this method can be used for Southern blot analysis, restriction enzyme digestion, dot blot 
analysis or PGR analysis, according to the iuvention. 

c. Restriction digest (of cDNA or genomic DNA) : 
Following the identification of a desired cDNA pr genomic clone containing a particular 

sequence, polynucleotides of the invention are isolated from these clones by digestion with restriction i 
20 enzymes. 

The technique of restriction enzyme digestion is well known to those skilled in. the art (Ausubel 
et al., supra). Reagents useful for restriction enzyme digestion are readily available from commercial 
vendors including New England Biolabs, Boebringer Mannhenn, Promega, as well as other sources. 

d. PGR 

25 Polynucleotide sequences of the invention are amplified from genomic DNA or other natural 

sources by the polymerase chain reaction (PGR). PGR methods are well-known to those skiUed in the 
art 

PGR provides a method for rapidly amplifying a particular DNA sequence by using multiple 
cycles of DNA replication catalyzed by a thermostable, DNA-dependent DNA polymerase to amplify 
30 the target sequence of interest. PGR requires the presence of a nucleic acid to be amplified, two 
single sttanded oligonucleotide primers flanking tire sequence to be amplified, a DNA polymerase, 
deoxyribonucleoside triphosphates, a buffer and salts. 

The method of PGR is well known in the art PGR, is performed as described in MuDis and 
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Faloona, 1987, Methods Enzvmol. , 155: 335, herein incorporated by reference. 

PGR is performed using template DNA (at least 1 fg; more usefully, 1 - 1000 ng) and at least 
25 pmol of oligonucleotide primers. A typical reaction mixture includes: 2 ml of DNA, 25 pmol of 
oligonucleotide primer, 2.5 ml of lOx PGR buffer 1 (Perkin-Elmer, Foster Gity, CA), 0.4 ml of 1.25 
5 roM dNTP, 0. 15 ml (or 2.5 units) of Taq DNA polymerase (Perkin Elaier, Foster Gity, CA) and 

deionized water to a total volume of 25 ml. Mineral oil is overlaid and the PGR is performed using a 
programmable thermal cycler. 

The lengtii and temperature of eadi step of a PGR cycle, as well as the number of cycles, are 
adjusted according to the stringency requirements in effect. AnneaUng temperature and timing are 
10 determined both by the efficiency with which a primer is expected to anneal to a template and the 
degree of mismatch that is to be tolerated. The ability to optimize the stringency of primer annealing 
conditions is well withia the kaowledge of one of moderate skill in the art. An annealing temperature 
of between 30°G and 72°G is used. Initial denaturation of the template molecules normally occurs at 
between 92°G and 99°C for 4 minutes, followed by 20-40 cycles consisting of denaturation C94-99^G 
15 for 15 seconds to 1 minute), aimealing (temperature determined as discussed above; 1-2 minutes), and 
extension (72°G for 1 minute). The final extension step is generally carried out for 4 minutes at 72*'G, ; 
and may be followed by an indefinite (0-24 hour) step at 4°G. 

Several techniques for detecting PGR products quantitatively without electrophoresis may be 
useful according to the invention in order to make it more suitable for easy clinical use. One of these 
20 techniques, for which there are commercially available Mts such as Taqman'°*^ (Perkin Elmer, Foster 
City, CA), is performed with a transcript-specific antisense probe. This probe is specific for the PGR 
product (e.g. a nucleic acid fragment derived from a gene) and is prepared with a quencher and 
fluorescent reporter probe complexed to the 5' end of the oligonucleotide. Different fluorescent 
markers can be attached to different reporters, allowing for measurement of two products in one 
25 reaction. When Taq DNA polymerase is activated, it cleaves off the fluorescent reporters of the 
probe bound to the template by virtue of its S'-tp-B ' nucleolytic activity. In the absence of the 
quenchers, the reporters now fluoresce. The color change in the reporters is proportional to the 
amount of each specific product and is measured by a fluorometer; therefore, the amount of each 
color can be measured and the PGR product can be quantified. The PGR reactions can be performed 
30 in 96 weD. plates so that samples derived from many individuals can be processed and measured 
simultaneously. The Taqman'"** system has the additional advantage of not requiring gel 
electrophoresis and allows for quantification when used with a standard curve. 
2. Polynucleotide Sequences Comprising RNA 
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The present invention also provides a polynucleotide sequence comprising KNA. A 
polynucleotide comprisinig RNA is useful for detecting snps and polymorphisms by techniques 
including but not limited to hybridization methods or the RNase protection method. A polynucleotide 
comprising RNA is also useful as a template for the in vitro production of protein. A polynucleotide 
5 comprising RNA is also useful for detecting and localizing specific mRNA sequences by in situ 
hybridization. 

Polynucleotide sequences comprising RNA can be produced according to the method of in 
vitro transcription. 

The technique of in. vitro transcription is well known to those of skiH in the art. Briefly, the 
10 gene of interest is inserted into a vector containing an SP6, T3 or T7 promoter. The vector is linearized 
with an appropriate restriction enzyme that digests the vector at a single site located downstream of 
the coding sequence. Following a phenoychloroform extraction, the DNA is ethanol precipitated, 
washed in 70% ethanol, dried and resuspended in sterile water. The in vitro transcription reaction is 
performed by incubating the linearized DNA with transcription buffer (200 mM TrisCl, pH 8.0,40 mM 
15 MgQj, 10 mM spermidine, 250 NaCl [T7 or T3] or 200 mM TrisCl, pH 7.5,30 mM MgO^, lOmM 
spermidine [SP6]), dithiothreitol, RNASE inhibitors, each of the four ribonucleoside triphosphates, and 
either SP6, T7 or T3 RNA polymerase for 30 min at 37^^:. To prepare a radiolabeled polynucleotide 
comprising RNA, unlabeled UTP will be omitted and -SUTP will be included in the reaction mixture. 
The DNA template is then removed by incubation with DNasel. Following ethanol precipitation, an 
20 aliquot of the radiolabeled RNA is counted in a scintillation counter to determine the cpm/ml (Ausubel 
et al, supra). ^ 

Alternatively, polynucleotide sequences comprising RNA are prepared by chemical synthesis 
techniques such as solid phase phosphoramidite (described above). 

3. Polynucleotide Sequences Comprising Oligonucleotides 

25 A polynucleotide sequence comprising oligonucleotides can be made by usmg oligonucleotide 

synthesizmg machines which are commercially available (described above). 

4. Polynucleotide Sequences Encoding Fusion Proteins 
Polynucleotide sequences of the invention can be used to express the protein product (or 

fragment thereof) of the gene of interest by inserting the polynucleotide sequence into an expression 
30 vector. Expression vectors suitable for protein expression in mammalian cells, bacterial cells, insect 
cells or plant cells are well known in the art and are described in Section H entitled "Production of a 
Mutant Protein". 

Polynucleotide sequences of the invention can be used to prepare hybrid i»olynucleotides 
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comprising a sequence of a gene adjacent to a sequence encoding a foreignprotein or a fragment 
thereof (e.g lacZ, trpE, glutathionine S-transferase or thioredoxin) or a protein tag (hemmaglutinin or 
FLAG). Sucli hybrid polynucleotides produce fusion proteins that are useful, according to the 
invention, for improved expression and/or rapid isolation of a protein or protein fragment, encoded by 

5 the sequence of a gene. Hybrid poljoiucleotides are also useful as a source of antigen for the 
production of antibodies. 

Nucleic acid constructs comprising a polynucleotide of genomic, cDNA, synthetic or semi- 
synthetic origin in association with a pol3raicleotide sequence encoding a foreign protein or a fragment 
thereof, (carrier sequence) can be generated by recombinant nucleic acid techniques well known in 

10 the art (See Ausubel et al., supra). According to this metihod, the cloned gene is introduced into an 
expression vector at a position located 3' to a carrier sequence coding for the amino terminus of a 
highly expressed protein, an entire functional moiety of a hi^y expressed protein or the entire protein. 
It is preferable to use a earner sequence from an E. coli gene or from any gene that is expressed at 
high levels in E. coli. It is often preferable to select a carrier sequence that will facilitate protein 

15 purification, either with antibodies, or with an affinity purification protocol tihiat is specific for the 

carrier protein being used. For example, the purification protocol can be designed in accordance wifli 
the unique physical properties of the earner protein (e.g. heat stability). Alternatively, the tag sequence^ 
may encode a protein (e.g. glutathione-S -transferase (GST)) which can be purified by either a 
chemical interaction (for example glutathione purification of GST). Alternatively, some carrier 

20 proteins, such as thioredoxin (Trx) can be selectively released from intact cells by osmotic shock or 
freeze/thaw procedures. Often, proteins that are flised to Ihese carrier proteins can be purified away 
from intracellular contaminants by virtue of the physical attributes of the carrier protein (Ausubel et 
al., supra). 

To ensure that a fusion protein is useful, according to the invention, it may be necessary to 
25 modify the expression protocol to produce a soluble protein. Due to the fact that high-level expression 
of certain proteins can lead to the formation of inclusion bodies, if a soluble protein is required it may 
be necessary to modify the following variables. The temperature at which expression is induced can 
affect inclusion body formation since inclusion body formation is induced at higher temperatures (37°C 
and 42°C) and inihibited at lower temperatures (BCC). In certain instances, lowering tiie total level of 
30 protein expression can lead to an increase in the proportion of soluble protein that is produced. The 
strain background of the cells in which the protein is being produced can affect the proportion of a 
particular protein that is expressed in a soluble form. Furthermore, the choice of carrier protein can 
affect the solubility of an e:^ressed fusion protein (Ausubel et al., supra), 
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An additional problem that can be encountered when producing fusion proteins in E. coli is 
formation of an unstable protein, or a protein that is cleaved at the site of the junction between the 
carrier sequence and the sequence of the protein of interest. To decrease complications due to protein 
instability one can arrange for the fusion protein to be expressed as insoluble aggregates. Alternatively, 
5 one can express the fusion protein in E. coli strains that are deficient in proteases (Ausubel et al., 
supra). 

Often it is useful to remove the carrier protein moiety from the protein of interest to facilitate 
biochemical and functional analyses. Methods for cleavage of fusion proteins to remove the carrier 
are known to those skilled in the art. The choice of a method is usually determined by the composition, 

10 sequence, and physical characteristics of the particular protein. Reagents such as cyanogen bromide, 
hydroxylamine or low pH can be used to chemically cleave fusion proteins. To avoid complications 
resulting from chemical cleavage (e.g. the presence of chemical cleavage sites in the protein of 
interest and/or the occurrence of side reactions resulting in protein modification), enzymatic cleavage 
methods can be used. Enzymatic cleavage protocols are advantageous because they can be carried 

15 out under relatively mild reaction conditions, and because they involve highly specific cleavage 
reactions. Enzymes useful for enzjanatic cleavage of fusion proteins include factor Xa, thrombin, 
enterokiaase, renin and collagenase (Ausubel etal., supra). 

Recombinant constructs encoding fusion proteins wherein the carrier sequence is on the order 
of 9-15 codons, can be generated by PGR methods. According to this mefhod, a PGR primer will be ^ 

20 designed to contain at least 13 nucleotides that are identical to the target sequence on either side of the 
nucleotide sequence encoding the carrier sequence. Preferably, the PGR primer will also contain a 
restriction enzyme site to facilitate cloning of the amplified product into an appropriate expression 
vector. PGR wiU be carried out as described above and the sequence of the amplified product wiH be 
confirmed by sequence analysis as described in Section D entitled 'Isolation of a Wild type Gene". 

25 Alternatively, recombinant constructs encoding fusion proteins can be generated by 

site/oligonucleotide directed mutatagenesis (Ausubel et al., supra). According to the method of site 
directed mutatagenesis the DNA to be mutated is inserted into a plasmid which has an Fl origin of 
replication. A mutagenesis oligonucleotide is designed to contain 13 bp that are 100% identical to the 
target sequence, on either side of a sequence coding for the 9-15 codons of carrier sequence, that is to 

30 be added by the mutatgenesis protocol. 

A single stranded preparation of Ihe vector is prepared by the following method. Following 
transformation of an appropriate bacterial strain (e.g. CJ236) with the recombinant plasmid and plating 
of the bacteria on LB agar plates, a single resulting colony is grown in 4x5 ml of LB plus ampiciUin for 
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1 hour at 37°C wifli vigorous shaking. MlSiKO? helper phage (2 ml, approximately 10^°-10" plaque 
forming units) is added and the bacteria are grown for an additional hour at 37°C with vigorous 
shaking. Following the addition of 7 ml of kanamycin (50 mg/ml), the bacteria are grown overnight at 
37°C with vigorous shaking. The following day bacterial cultures are pooled and cells are separated by 
5 centrifugation. After the addition of 2.6 ml of 20% polyethylene glycol 200-800/2M NaCl to 20 ml of 
bacterial supernatant, the sample is incubated for 1-1.5 hours on ice. The sample is pelleted by 
centrifugation at 9000 rpm for 20 minutes. Following removal of the supernatant, residual supernatant 
are removed by centrifugation at 3000 rpm for 5 miautes. The pellet is resuspended iii 400 ml of TE, 
extracted twice with phenol and four times with phenoLcUoroform and ethanol precipitated. The 
10 resulting pellet is resuspended in 40 ml TE. 

Mutagenesis is performed by using a muta-gene kit (Bio-Rad, Hercules, CA) according to the 
following method. To kinase the ohgonucleotide primer, 1 ml (200ng) of oligonucleotide is incubated in 
the presence of 2 ml of 10 kinase buffer (0.5M Tris, pH 8.0, 70mM MgOa, lOmM DTT), 2 ml 
lOmM rATP, 2 ml polynucleotide kinase and 13 ml H2O for 37°C for 1 hour. To carry out the 
15 annealing and synlhesis steps, 2.5 ml of single-stranded template are mixed with 1 ml of kinased 
oligonucleotide, 1.0 ml of lOX annealing buffer (200mM Tris^HCl, pH 7.4, 20 mM MgCl^, 500mM 
NaCl) and 5.5 ml H2O for 10 min at 65*^C. The reaction mixture is slow-cooled to 37°C. Once the 
sample has reached 37°C, the sample is spun briefly in a microfiige. Following the addition of 1.0 ml 
of lOX synthesis buffer (5mM each dATP, dCTF, cGTP, dTTP, lOmM ATP, lOOmM Tris-HCl, pH 
20 7.4, 50 mM MgCla, 20mM DTT), 1.0 ml T4 DNA ligase and 0.5 ml of T4 DNA polymerase, the 
sample is incubated for 5 minutes on ice, 5 minutes at room temperature and 1 hour at 37°C. A 2 ml 
aliquot of the sample is used to transform E. coli. 

DNA is isolated from the transformed E. coli cells by mini prep methods known in the art 
(Ausubel et al., supra), and sequenced according to melhods known in the art (described in Section D 
25 entitled 'Isolation of a Wild Type Gene". 

C. Production of a Nucleic acid Probe 

The invention discloses nucleic acid probes. Preferably, the nucleic acid probes of the 
invention are specifically hybridizable to a mutant gene but not to a wild type form of a gene due to the 
30 presence of one or more polymorphisms. These allele specific probes can be used to screen DNA 
sequences of a gene which have been amplified by PGR, or are present in a genomic DNA or RNA 
test sample. Hybridization of a particular allele specific probe to an amplified gene sequence, under 
stringent conditions (described below), indicates that the polymorphism contained in the probe is 
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present in the amplified sequence. Hybridization of a particular allele specific probe to a test sample 
comprising genomic DNA or RNA, under stringent conditions (described below), indicates that the 
polymorphism contained in the probe, is present in the nucleic acid of the test sample. Nucleic acid 
probes that are specifically hybridizable to a wild type form of a gene but not to a mutant form of a 

5 gene are also useful accorditig to the invention. 

In another embodiment, the probes of the claimed invention will be specific for a nucleic acid 
region that is adjacent to a region that is thought to contain one or more polymorphisms. These probes 
will be useftil for detecting the presence of one or more polymorphisms in the adjacent region by the 
method of primer extension (as described in Section F entitled 'Identification and Characterization of 

10 Polymorphisms". 

In other embodiments, probes of the claimed invention will be used to detect a gain or loss of a 
restriction enzyme site known to contam one or more polymorphisms of the claimed invention. Nucleic 
acid probes, according to this embodiment, are able to detect a restriction enzyme fragment that is of a 
size that can be easily separated on an agarose gel and visualized by Southern blot analysis. Probes 

15 that are useful according to this embodiment of the claimed invention can be specific for any region 
within a gene or outside of a gene. 

The nucleic acids probes of the invention are useful for a variety of hybridization-based 
analyses including but not limited to Southern hybridization to genomic DNA, cDNA sequences or ; 
PGR amplification^products. Northern hybridization to mRNA and RNase protection assays, DNA n 

20 sequencing and isolation of genomic or cDNA clones of a gene. The probes may also be used to 

determine whether mRNA encoded for by a gene is present in a cell or tissue by the method of in situ 
hybridization; These techniques are well Imown in the art and can be performed as described in 
Ausubel et al., supra. 

According to the methods of the above-referenced hybridization assays, polymorphisms 
25 associated with alleles of a gene, which either predispose to a particular disease (e.g. osteoarthritis) or 
are not associated with a particular disease (e.g. osteoarthritis), wiH be detected by the formation of a 
stable hybrid consisting of a polynucleotide probe comprising one or more polymorphisms and a target 
sequence, that also comprises one or more polymorphisms, under stringent to moderately stringent 
hybridization and wash conditions. If it is e2q)ected that the probes will be perfectly complementary to 
30 the target sequence, stringent conditions will be used. Hybridization stringency may be lessened if 

some mismatching is expected, for example, if variants are ejcpected with the result that the probe will 
not be completely complementary. Conditions are chosen which rule out nonspecific/adventitious 
bindings, that is, which minimize noise. Since such indications identify neutral DNA polymorphisms as 
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well as mutations, these indications need further analysis (such as assays described m Section F 
entitled "Identification and Characterization of Polymorphisms") to demonstrate detection of a 
susceptibility allele of a gene. 

Probes for alleles of a gene may be derived from genomic DNA or cDNA sequences from 
5 specific for tlie gene of interest. Tho probes may be of any suitable length, which span all or a portion 
of the region containing the gene. If the target sequence contains a sequence identical to that of the 
probe, the probes may be sbort, e.g., in the range of about 8-30 base pans, since the hybrid will be 
relatively stable under even stringent conditions. If some degree of mismatch is expected with the 
probe, i.e., if it is suspected that the probe will hybridize to a variant region, a longer probe may be 
10 employed which hybridizes to the target sequence with the requisite specificity. 

Probes according to the mvention also raclude an isolated polynucleotide attached to a label or 
a reporter molecule which may be useful for isolating other polynucleotide sequences, haviug 
, sequence similarity by standard methods, including but not limited to the above-referenced 
hybridization-based assaj^. Techniques for preparing and labeling probes (as described m Ausubel et 
15 al. Supra) are included below. A wide variety of labels and conjugation techniques are known by tiiose 
skilled in the art and can be used in a various nucleic acid and amino acid assays. Means for producing 
labeled hybridization or PGR probes for detecting related sequences include oligolabeling, nick 
translation, end-labeling or PGR amphfication using a labeled nucleotide. Alternatively, the protein- 
encoding sequence, or any portion of it, may be cloned into a vector for the production of an mRNA 
20 probe. Such vectors are known in. the art, are commercially available, and may be used to synthesize 
UNA probes in vitro by addition of an appropriate RNA polymerase such as T7, T3 or SP6 and 
labeled nucleotides. 

A number of companies such as Pharmacia Biotech (Piscataway NJ), Promega (Madison 
WI) and US Biochemical Gorp (Gleveland OH) supply commercial kits and protocols for these 

25 procedures. Suitable reporter molecules or labels include those radionuclides, enzymes, fluorescent, 
chemiluminescent, or chromogenic agents as well as substrates, cofactors, inhibitors, magnetic 
particles and the like. Patents teaching the use of such labels include US Patents 3,817,838; 3,350,752; 
3,939,350; 3,996,345; 4,277,437; 4,275,149 and 4,366,241. Also, recombinant immunoglobulins maybe 
produced as showniuUS Patent No. 4,816,567 incorporated herein by reference. 

30 Probes comprising synthetic oligonucleotides or other polynucleotides of the present invention 

maybe derived from naturally occurring or recombinant single- or double- stranded polynucleotides, or 
be chemically synthesized. 

Portions- of the polynucleotide sequence having at least approximately 5 nucleotides, 
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preferably 9-15 nucleotides, fewer than about 6 Kb and usually fewer than about 1 Kb, from a 
polynucleotide sequence encoding a gene are preferred as probes. 

A DNA probe useful according to the present invention can be isolated from a gene or a 
polynucleotide constmct derived from a gene, or from a cDNA sequence specijfic for a gene or a 
5 cDNA construct specific for a gene by the methods of PGR or restriction enzyme digestion, as 

described above. Riboprobes useM according to the invention can be synthesized by the method of in 
vz77'o transcription, or by chemical synthesis methods, as described above. 

An oligonucleotide probe useful according to the invention can be designed, as described 
above, and syufiiesized in a commercially available automated synthesizer. 
10 Nucleic acid hybridization rate and stabflily will be affected by a variety of experimental 

parameters including salt concentration, temperature, the presence of organic solvents, the viscosity of 
the bybridization solution, the base composition of the probe, the length of the duplex, and the number 
of mismatches between the hybridizing nucleic acids (Ausubel et al., supra), and as described in 
Section A entitled •'Design and Synthesis of Oligonucleotide Primers". 
15 Southern blot analysis can be used to detect sequence variations in a gene from a PGR 

amplified product or from a total genomic DNA test sample via a non-PCR based assay. The method 
of Southern blot analysis is well Known in the art (Ausubel et al., supra, SambrooK et al., 1989, 
Molecular rinnin fr. A Laboratory Manual.. 2nd Edition , Cold Spring Harbor Laboratory Press, Cold 
Spring Harbor, NY). This, technique involves the transfer of DNA fragments from an electrophoresis 
20 gel to a membrane support resulting in the immobilization of the DNA fragments. The resulting 
membrane carries a semipermanent reproduction of the banding pattern of the geL 

Southern blot analysis is performed according to the following method. Genondc DNA (5-20 
mg) is digested with the appropriate restriction enzyme and separated on a 0.6-1.0% agarose gel in 
TAE buffer. The DNA is transferred to a commercially available nylon or nitrocellulose membrane 
25 (e.g. Hj^ond-N membrane, Amersham, Arlington Heights, IL) by methods well known in the art 

(Ausubel et al., supra, SambrooK et al., supra). Following transfer and UV cross linking, the membrane 
is hybridized with a radiolabeled probe in hybridization solution (e.g. under stringent conditions in 5X 
SSC, 5X Denhardt solution, 1 % SDS) at 65°C. Alternatively, high stringency hybridization can be 
performed at 68°C or in a hybridization buffer containing a decreased concentration of salt, for 
30 example O.IX SSC. The hybridization conditions can be varied as necessary according to the 
parameters described in Section A entitled 'T>esign and Synthesis of OUgonucleotide Primers". 
Following hybridization, the membrane is washed at room temperature in 2X SSC/D.1% SDS and at 
65°C in 0.2X SSC/0.1% SDS, and exposed to film. The stringency of the wash buffers can also be 
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varied depending on the amount of the background signal (Ausubel et al., supra). 

Detection of a nucleic acid probe-target nucleic acid hybrid will include the step of hybridizing 
a nucleic acid probe to the DNA target This probe may be radioactively labeled or covalently linked 
to an enzyme such that the covalent linkage does not interfere with the specificity of the hybridization. 
5 A resulting hybrid can be detected with a labeled probe. Methods for radioactively labeling a probe 
include random oligonucleotide primed syntliesis, nick translation or kinase" reactions (see Ausubel et 
al., supra). Alternatively, a hybrid can be detected via non-isotopic methods. Non-isotopicaUy labeled 
probes can be produced by the addition of biotin or digoxigenin, fluorescent groups, chemiluminescent 
groups (e.g. dioxetanes, particularly triggered dioxetanes), enzymes or antibodies. Typically, non- 
10 isotopic probes are detected by fluorescence or enzymatic methods. Detection of a radiolabeled 

probe-target nucleic acid complex can be accomplished by separating the complex from free probe 
and measuring the level of complex by autoradiography or scintillation counting. If tlie probe is 
covalently linked to an enzyme, the enzyme-probe-conjugate- target nucleic acid complex will be 
isolated away from the free probe enzyme conjugate and a substrate wfll be added for enzyme 
15 detection. Enzymatic activity will be observed as a change in color development or luminescent output 
resulting in a 10^-10* increase in sensitivity. An example of the preparation and use of nucleic acid 
probe-enzyme conjugates- as hybridization probes (wherein the enz5me is alkaline phosphatase) is 
described in (Jablonski et al.. 1986. Nucleic Acids Res. , 14:6115) 

Two-step label amplifi:cation melhodologies are known in the art. These assays are based on 
20 file principle that a small ligand (such as digoxigenin, biotin, or the like) is attached to a nucleic acid 
probe capable of specifically bindiug to a gene. Allele specific gene probes are also usefiil according 
to this mefihod. 

According to the method of two-step label amplification, the small ligand attached to the 
nucleic acid probe wiU be specifically recognized by an antibody-enzyme conjugate. For example, 

25 digoxigenin will be attached to the nucleic acid probe and hybridization will be detected by an antibody- 
alkaline phosphatase conjugate wherein the alkaline phosphatase reacts with a chemiluminescent 
substrate. For methods of preparing nucleic acid probe-small ligand conjugates, see (Martin et aL, 
1990, BioTechniques . 9:762). Alternatively, the small ligand will be recognized by a second ligand- 
enzyme conjugate that is capable of specifically complexing to the first ligand. A well known example 

30 of this manner of small ligand interaction is the biotin avidin interaction. Methods for labeling nucleic 
acid probes ^d their use in biotin-avidin based assays are described in Rigby et al, 1977, J. Mol. BioL . 
113:237 and Nguyen et al., 1992, BioTechnioues. 13:116). 

Variations of the basic hybrid detection protocol are known in the art, and include 
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modifications that facilitate separatton of the hybrids to be detected from extraneous materials and/or 
that employ the signal from the labeled moiety. A number of these modifications are reviewed in, e.g., 
Matthews & Kricka, 1988, Anal. Biochem. . 169:1; Landegren et al., 1988, Science . 242:229; Mittlin, 
1989, CUncal Chem. 35:1819; U.S. Pat No. 4,868,105, and in EPO Publication No. 225,807. 

5 

D. Isolation of a Wild type gene 

A wild type version of a c£Uididate gene according to die invention can be isolated by cloning 
from an appropriately selected genomic library according to methods well known in flie art. Methods 
of cloning are described in Section B entitled "Production of a Polynucleotide Sequence 

10 The sequence of the cloned gene will be determined by sequencing methods well known in the 

art (see Ausubel et al., supra and Sambrook et al., supra). Methods of sequencing employ such 
enzymes as the Klenow fragment of DNA polymerase I, Sequenase® (US Biochemical Corp, 
Cleveland, OH), Taq polymerase (Perkin Elmer, Norwalk, CT), thermostable T7 polymerase 
(Amersham, Chicago, IL), or combinations of recombinant polymerases and proofreading 

15 exonucleases such as the ELONGASE Amplification System (Gibco BRL, Gaithersburg, MD). 

Preferably, the process is automated with machines such as the Hamilton Micro Lab 2200 (Hamilton, 
* . . Reno NV), Peltier Thermal Cycler (PTC200; MJ Research, Watertown, MA) and the ABI 377 DNA / - 
seq[uencers (Perkin Ehner). 

20 E. Isolation of a Mutant Gene 

A mutant version of a candidate gene according to the invention can be isolated by cloning 
from an appropriately selected genomic library according to methods well known in the art. Methods 
of cloning are described in Section B entitled "Production of a Polynucleotide Sequence." 

The sequence of the cloned gene wiU be determined by sequencing methods described in 
•25 Section D entitled 'Isolation of a Wild Type Gene." 

F. Identification and Characterization of Polymorphisms 

a. Identification of SNPs by in silico methods (isSNPs) 

1. Identification of Polymorphisms in Candidate Genes 
30 The starting point is a set of experimentally derived nucleic acid sequences. In order to be 

useftil for SNP discovery by the invention, it is preferred that Ihe sequences have complete 
chromatogram files from a gel or capillary electrophoresis sequencing machine. When this is not 
available, quality score data which assigns a score to each base in the sequence indicating the 
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likelihood of error for flie basecall may be used. If neither of these data are available, the sequence 
may be used to assist the chisteiing of other sequences and in some cases to provide additional 
verification for a discovered SNP, but is not be used by the invention for liie identification of the 
polymorphism. 

5 The population of sequences used may constitute either a database of cDNA-derived 

sequences or genomic sequence, ha a preferred embodiment, sequences used by the invention are 
from an assembled cDNA database, such as the LifeSeqGold database (hicyte Genomics, 
Inc(Incyte), Palo Mto, CA). 

Derivation of Nucleic Acid Sequences 

cDNA was isolated from libraries constructed using RNA derived from normal and diseased 
human tissues and cell lines. The human tissues and cell lines used for cDNA library construction 
were selected from abroad range of sources to provide a diverse population of cDNAs representative 
of gene transcription throughout the himoian body. Descriptions of the human tissues and cell lines 
used for cDNA library construction are provided in the LIFESEQ database (Incyte Pharniaceuticals, 
Inc. (Incyte), Palo Alto CA). Human tissues were broadly selected from, for example, 
cardiovascular, dermatologic, endocrine, gastrointestinal, hematopoieticyiounune system, vv 
musculoskeletal, neural, reproductive, and urologic sources. 

Cell lines used for cDNA library construction were derived from, for example, leukemic cells, ' 
teratocarcinomas, neuroepitheliomas, cervical carcinoma, long fibroblasts, and endothelial ceDs. Such 
cell lines include, for example, THP-l, Jurkat, HUVEC, hNT2, WI38, HeLa, and other cell lines 
commonly used and available from public depositories (American Type Culture Collection, Manassas 
VA). Prior to mRNA isolation, cell lines were untreated, treated with a pharmaceutical agent such as 
5 -aza-2 -deoxycytidine, treated with an activating agent such as hpopolysaccharide in the case of 
leukocytic cell lines, or, in the case of endothelial cell lines, subjected to shear stress. 

Sequencing of the cDNAs 

Methods for DNA sequencing are well known in the art. Conventional enzymatic methods 
employ the Kleno w fragment of DNA polymerase I, SEQUENASE DNA polymerase (U. S . 
30 Biochemical Corporation, Cleveland OH), Taq polymerase (The Perkin-Elmer Corporation (Perkin- 
Elmer), Norwalk CT), thermostable T7 polymerase (Amersham Pharmacia Biotech, Inc. (Amersham 
Pharmacia Biotech), Piscataway NJ), or combinations of polymerases and proofreading exonucleas^ 
such as those found in the ELONGASE amplification system (Life Technologies Inc. (Life 
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Techttolo^es), Gaithersburg MD), to extend the nucleic acid sequence from an oligonucleotide ptimer 
annealed to the DNA template of interest. Methods have been developed for the use of both single- 
stranded and double-stranded templates. Chain termination reaction products may be electrophoresed 
on urea-polyacrylamide gels and detected either by autoradiography (for radioisotope-labeled 

5 nucleotides) or by fluorescence (for fhiorophore-labeled nucleotides). Automated methods for 

mechanized reaction preparation, sequencing, and analysis using fluorescence detection methods have 
been developed. Machines used to prepare cDNAs for sequencing can include the MICROI..AB 
2200 liquid transfer system (Hamilton Company (Hamilton), Reno NV), Peltier thermal cycler 
(PTC200; MJ Research, Inc. (MJ Research), Watertown MA), and ABI CATALYST 800 thermal 

10 cycler (Perkin-Ehner). Sequencing can be carried out using, for example, the ABI 373 or 377 

(Perldn-Elmer) or MEGABACE 1000 (Molecular Dynamics, Inc. (Molecular Dynamics), Sunnyvale 
CA) DNA sequencing systems, or other automated and manual sequencing systems well known in the 
art. 

The nucleotide sequences have been prepared by current, state-of-the-art, autonmted methods 
15 and, as such, may contain occasional sequencing errors or unidentified nucleotides. Such unidentified 

nucleotides are designated by an N. These infrequent xmidentified bases do not represent a hindrance .. 

to practicing the invention for those skilled in the art. Several methods employing standard 

recombinant techniques may be used to correct errors and complete the missing sequence information. 

(See, e.g., those described in Ausubel, F.M. .et al. (1997) Short Proto cols in Molecular Biology, John 
20 Wiley & Sons, New York NY; and Sambrook, J. et al. (1989) Molecular C 1»riitnT, A Laboratory 

Manual . Cold Spring Harbor Press, Plainview NY.) 

Assembly of cDNA Sequences 

Human polynucleotide sequences maybe assembled using programs or algorithms well known 

25 in the art. Sequences to be assembled are related, whoEy or in part, and may be derived firom a single 
or many different transcripts. Assembly of the sequences can be performed using such programs as 
PHRAP (Phils Revised Assembly Program) and the GELVCEW fi-agment assembly system (GCG), 
or other methods known in the art. 

Alternatively, cDNA sequences are used as "component'* sequences that are assembled into 

30 "template" or "consensus" sequences as follows. Sequence chromatograms are processed, verified, 
and quality scores are obtained using PHRED. Raw sequences are edited usmg an editing pathway 
known as Block 1 (See, e.g., the IJFESEQ Assembled User Guide, Incyte Pharmaceuticals, Palo 
Alto, CA). A series of BLAST comparisons is performed and low-information segments and 
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repetitive elements (e.g., dinucleotide repeats, Alu repeats, etc.) are replaced by "n's", or masked, to 
prevent spurious matches. Mitochondrial and ribosomal RNA sequences are also removed. The 
processed sequences are then loaded into a relational database management system (RDMS) which 
assigns edited sequences to existing temjilates, if available. When additional sequences are added into 
the RDMS, a process is initiated which modifies existing templates or creates new templates from 
works in progress (i.e., nonfinal assembled sequences) containing queued sequences or the sequences 
themselves. After the new sequences have been assigned to templates, the templates can be merged 
into bins. If multiple templates exist in one bin, the bin caii be split and the templates reannotated. 

A resultant template sequence may contain either a partial or a full length open reading frame, 
or aU or part of a genetic regulatory element. This variation is due in part to the fact that the fuU 
length cDNAs of many genes are several hundred, and sometimes several thousand, bases in length. 
With current technology, cDNAs comprising the coding regions of large genes cannot be cloned 
because of vector limitations, mcomplete reverse transcriplion of the mRNA, or incomplete "second 
strand" synthesis. Template sequences may be extended to include additional contiguous sequences 
derived from the parent RNA transcript using a variety of methods known to those of skill in the art. 
Extension may thus be used to achieve iiie fuU length coding sequence of a gene. 

Analysis of the cDNA Sequences 

The cDNA sequences are analyzed using a variety of programs and algorithms which are 
20 wen known in the art. (See, e.g., Ausubel, supra . (Zhapter 7.7; Meyers, R. A. (Ed.) (1995) Molecular 
Biology and Biotechnology . Wiley VCH, New York NY, pp. 856-853). These aiialyses comprise both 
reading frame detenninations, e.g., based on triplet codon periodicity for particular organisms (Pickett, 
J. W. (1982) Nucleic Acids Res. 10:5303-53 1 8); analyses of potential start and stop codons; and 
homology searches. , 
25 Computer programs known to those of skill in the art for performing computer-assisted 

searches for amino acid and nucleic acid sequence sinoilarity, include, for example, Basic Local 
AHgmnent Search Tool (BLAST; Altschul, S.F. (1993) J. MoL EvoL 36:290-300; Altschul, S.F.et aL 
(1990) J. Mol. BioL 215:403-410.) BLAST is especially useful in determining exact matches and 
comparing two sequence fragments of arbitrary but equal lengths, whose aligmnent is locally maximal 
30 and for which the alignment score meets or exceeds a threshold or cutoff score set by the user 

(Karliu, S. et al. (1988) Proc. NatL Acad. Sci. USA 85:841-845.) Using an appropriate search tool 
(e.g., BLAST or HMM), GenBank, SwissProt, BLOCKS, PFAM and other databases may be 
searched for sequences containing regions of homology to a query rbosm or RBOSM of the present 
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invention. 

Other approaches to the identification, assembly, storage, and display of nncleotide and 
polypeptide sequences are provided in "Relational Database for Storing Biomolecule Infonnation," 
U.S.S.N. 08/947,845, filed October 9, 1997; "Project-Based Full-Lengfli Biomolecular Sequence 

5 Database," U.S.S.N. 08/811,758, filed March 6, 1997; and "Relational Database and System for 

Storing Information Relating to Biomolecular Sequences," U.S.S.N. 09/034,807, filed March 4, 1998, 
an of which are incorporated by reference herein in their entirety. 

Protein hierarchies can be assigned to the putative encoded polypeptide based on, e.g., motif, 
BLAST, or biological analysis. Methods for assigning these hierarchies are described, for example, in 

10 "Database System :&nploying Protein Function EDLerarcMes for Viewing Biomolecular Sequence 
Data," U.S.S.N. 08/812,290, filed March 6, 1997, incorporated herein by reference. 

Identification of Sequence Variants and Polymorphisms 

The method comprise a series of filters to identify isSNPs firom other sequencing variants and 
15 errors. The filters can be grouped into the following five sets of filters by the order of application m 
the method: 

Preliminary Filters: the maia filter in the first group removes the majority of base call errors by 
requiring a minimum pbred quality score of 15. Additional filters at this stage deal with sequence 
aHgnmeait errors as well, as errors resulting from improper trimming of vector sequence, chimeras and: 
20 splice junctions. 

Advanced Chromatogram Analysis: additional base call errors are then detected by examining 
the original chromatogram files in the vicinity of a putative SNP by an automated procedure resulting 
in a set of SNPs wherein the base call error rate is reduced to less than 5%. 

Clone Error Filters: errors introduced during laboratory processing such as those caused by 
25 reverse transcriptase, polymerase or somatic mutation are among the most difficult to distinguish from 
true SNPs. The Clone Error filters use statistically generated algorithms to identify these sources of 
error. A small percentage of actual SNPs will be discarded at this stage. 

Clustering Error Filters : these types of errors result from the incorrect clustering of close 
homologs, pseudo- genes or from contamination by nohhuman sequences. The filters developed to 
30 minimize these clustering errors are also statistically based. As above these filters may be reject a 
fraction of actual SNPs 

Finishing Fdters: these filters remove duplicate and redundant SNPs from the generated list of 
SNP, and remove SNPs which are from the hypervariable regions of hypervariable genes such as 
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immunoglobulin and T cell receptors. 
Pre-processing steps 

The sequences must first be trmimed to eliminate vector sequence, contamination and 

5 repetitive sequences. Then certain low information content sequences (for example, long runs of a 
single base, or two or three-base repeats) and repetitive sequences (for example Alu sequences in 
humans) must be massed (changed to N's) to prevent over-clustering errors. The clustering process 
Hien identifies the sets of sequences that are believed to be derived from the same original DNA 
sequence or gene. The sequences in each, cluster or then aligned using a method such as phrap which 

10 also defines a consensus sequence. It will be well recognized by those skilled in the art that there are 
numerous existing programs for carrying out these processes, and the SNP discovery process 
described herein will work equally well with any of them. In the instant embodiment, the preferred 
processes are Blocked 1 for trimming and masking, a variety of different algorithms for clustering, and 
phrap for the aligmnent. It will be recognized by those skilled in tihe art that phrap and other alignment 

15 methods carry out a secondary clustering step which divides clusters into contigs, and carry out a 
secondary trimrnitig step which defines the end points of the portion of each sequence which 
participates in the contig. The contigs then maybe searched for die occurrence of SNPs. 

Errors in the trimming, clustering and alignment processes will cause SNP discovery errors, 
usually false positives (the prediction of SNPs where they do not exist). Additional filters which are 

20 the subject of the invention are designed to recognize and remove these errors by providing the ability 
to identify likely errors in the processes and to correct them, 

In some instances, it is preferred, as an optional step, to immask regions of sequences which 
were masked because of low information content or repetitive sequence) during the clustering process 
can be unmasked after clustering to allow discovery of SNPs within these regions. 

25 

Identification of Candidate SNP Sequences 

Th& first step in identifying candidate SNP sequences is to redefine the end points of each 
sequence as the points within the previous end points where a stretch of at least 10 consecutive base 
calls, containing at least eight base changes, matches the consensus sequence exactly. Sequence 
30 trimming errors (both at single sequence stage and at the alignment stage contribute to the false 

positives when foreign sequence (vector, chimera or splice variant) is similar to the real sequence and 
the true boundary is difficult to determine. This step is a conservative approach to avoid false 
positives and also filters out lower-quality sequence fliat the ends. ITie reason the lengfli of the match 
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with a consensus is measured in base clianges is to avoid low sigmbBcaace matches on repetitive 
. sequence such, as polyA. 

The next step is an each position of the alignment to compare the base calls of all the aligned 
sequences which are between their stait and end positions and which have quality scores greater than 
5 a set threshold, and which have neighboring base calls which agree with a consensus sequence and 
where the neighboring base calls also have a quality score > liie threshold. Preferably the threshold is 
a phred quality score greater than or equal to 15. The possibilities are A, C, G, T, and -(deletion). 

The next step is a Clone Filter where if there has been more than one base call for a 
sequence position, then the clone for each sequence is identified in the sequences corresponding to 
10 each clone are compared. If the base calls for different sequences from the same clone disagree, 
then all the sequences for this clone at this base position are removed from consideration. 

After all of these filters, positions for which there is more than one base call are candidate 
SNPs. The "wUd type" base call is the one in the consensus sequence and the others are designated 
candidate SNPs. If the wild type base call is a deletion, then the SNP is considered to be an insertion 
15 at the previous base. 

Automated Chromatogram Checking 

The next filters require opening of the chromatogram files for the sequences identified as . 
containing candidate SNPs. At each candidate SNP position, the chromatogram data of each 
20 sequence passing the Identification Filters is extracted. The first step in this process utilizes a program 
ABIdump to translate binary ABI chroioatogram files into usable form. 

Multiple Base Call Algorithm filter: the ABI base calls for each sequence are compared to the 
phred base calls. If the base calls do not agree at the SNP position and the two adjacent flanking 
positions, then the sequences are removed from consideration. 
25 Intensity Filter: if the SNP is a single base change (this step is skipped for insertions and 

deletions), then the process intensity values for each of four bases at the call chromatogram location 
of the candidate SNP.base are used to compute a ratio. If we caU the intensity of wild type, "wt", the 
intensity of the SNP base "snp", the minimum of the other two "min", and the phred quality of the base 
call "Q", then the wild type sequences must have 
30 . (snp-min) < (wt-min)(Q-17)/37 and Q>=17 to be considered high-quality, and 
(snp-min)<(wt-min)(Q-4)/37 and Q>=15 to be considered a low quality pass. 

The basis for these formula is that if a base is mis- called, then there is likely to be a residual peak for 
the correct base. The larger the peak for the wild type base, the less likely that the call of the SNP is 
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correct. The actual thresholds in the formula are based on empirical data from clones which were 
sequence multiple times and which gave a set of confirmed SNPs and error rates for algorithm 
optimization. ^ 

The candidate SNP passes only if at least one wild type sequence passes and at least one 
5 SNP sequence passes. The quality of the candidate SNP is the lower of the highest wild type pass 
level and die highest SNP pass level (if there is a high-quality wild type sequence but only low quality 
SNP sequences, then the candidate is low quality. A SNP quality value is returned. 



Clone Error Oualitv Filters (somatic mutation/reverse transcriptase/polym erase errors) 
10 The purpose of these filters is to remove errors which are actually in the clone, that is, the 

clone sequence was correct but the clone does not represent the individual being sequenced. Three 
possible sources of these errors are somatic mutations, errors made by reverse transcriptase in the 
process of making cDNA, and DNA polymerase errors in those situations where the DNA has been 
amplified by PGR at some point prior to inserting in the cloning vector. Somatic mutations, can be a 
15 particular problem in sequencing clones derived from cell lines. 

Polymerase errors are specific to the type of sequencing protocol used. For example, reverse 
transcriptase is involved in EST sequencing but not genomic clone sequencing. Polymerase is involved 
in the creation of extension clones (polymerase is used in aU sequencing reactions, but errors are less 
likely to arise because only a fraction of the templates are affected in contrast to the extension 
20 process wbere a single polymerase product becomes a template for the entire reaction).' This filter is 
not applied to genomic sequences in the current embodiment on the premise tihat die genomic 
sequences do not have polymerase errors, and that somatic mutations are likely to have the same 
profile as real SNPs. 

This filter also filters out rare SNPs as well as apparent SNPs which are not real. It is 
25 difficult to determine and confirm by experiments to what extent SNP candidates are too rare to be 
confirmed vs. simply not rejiL For many applications, very rare SNPs are of less utility than common 
ones such that this is not a problem; however in some applications it may be advisable to turn this filter 
off. 



30 Base change sequence analysis filter 

The premise of this filter is that probabihties of different mutations is different depending on 
the source. For example true SNPs may be mostly fransitions whereas reverse transcriptase 
mntatiohs could be primarily G to T mutations. While this does not allow one to determine for sure 
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that a given change is a true SNP, it aDows one to evaluate the relative li!;c • / ; o : that a given mutation 
is a true SNP. SNP confirmation data suggest that G/T SNP candidates ; n Sh. there is only one 
clone having the T allele have a very low probability of being real SNPs. The SNP candidates are 
excluded from the high confidence set (they are kept id a different file-their confirmation rate is well 
5 below 50 percent). The other set which had a very low confirmation rate is any A/T SNP. 

Frequency Filter 

Hiis filter is based on the concept that true SNPs have a different frequency profile than 
clone errors and that a candidate SNP which is evident in only one clone in a deep alignment is less 
10 likely to be real than one which appears in one clone in a shallow alignment. The likelihood of finding 
a SNP at a given sequence location is a function of the number of chromosomes sequenced. This 
curve is distinctly non-linear as most SNPs are sufficiently frequent, to be found with relatively few 
sequences. The probability of an error of this type, however is essentially linear in the number of 
sequences since the chance of the change occurring in two different sequences is independent. This 
15 means that the probability that a candidate SNP observed in a single clone is a true SNP is lower if the 
alignment is deep then if a is shallow. Any SNP occurring m a single clone in an alignment of more 
• than 20 clones (counting only high-quality sequences which have a chance of contributmg a candidate .' 
SNP) is excluded from the high confidence set. 

This filter is the basis of a secondary method used to develop the base change sequence 
20 analysis filter. Comparing the set of single clone SNPs from shallow alignment's with those firom deep 
alignment's, which are more likely to be errors, will reveal base changes which are more Mkely to be 
associated with polymerase errors and somatic mutations. 

Clustering Error Filters 

25 These filters are intended to remove candidates SNPs which result from the mcorrect 

clustering of similar sequences such as highly homogenous genes, similar genomic sequences, and 
contammation firom other species where the sequences of the species have been mis- labeled as 
human. 



Number of base change filter 

This filter distinguishes homologous sequences from SNPs on the basis of the firequency of, 
variants. True SNPs occur about one per kd when comparing to sequences or once per 2 kb if tiie 
length of sequences is included, and this fraction decreases as the depth of the aHgnment increases. 
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Since EST sequences tend to be about 500 bp or less in length, then it would be expected to have not 
more than one SNP per four sequences. The number of SNPs in the cluster is divided by the number 
of sequences in the cluster and SNPs for which this number is larger than one are discarded. The 
higher the number, the less likely the SNP is to be real. The threshold value of one was chosen 
5 because it appears to correspond to roughly a 50 percent success rate, however the threshold value 
cotdd be adjusted to. higher value to accept lower confidence SNPs. 

Distance from next polymorphism filter 

This filter calculates the number of SNPs for which the sequence is the only representative 
10 within a window of 100 bases on either side, and discards any of the SNPs for which there are more 
than one other SNP in this window. This threshold can be set higher, but the actual fraction of SNP 
candidates which are true SNPs drops off to less than 50 percent. 

Haplotvpe c Tiisterinp; filter 

15 When sequences from different sources are inappropriately clustered, it is possible to divide 

them into two or more clusters which are consistent. In particular, if we take any two differences 
. ' between homologs and consider the haplotypes of the clones v.'hich overlap both SNPs, there are only;; 
two haplotypes. In other words, a 2x2 matrix of haplotypes is diagonal having only two non-zero 
entries. If there are only two sequences, then this is expected. For each SNP, a 2x2 haplotype matrix > 

20 with each other SNP is computed. If it is diagonal, and there are more than two sequences, than the 
sum of the diagonal elements minus one is a "cluster total" for this SNP. This "cluster total" number 
has proven to be empirically correlated with the confirmation rate, probably because it predicts 
clusters which contain para-logs, homologs and contamination from other species. Candidates SNPs 
which have a cluster number of less than eight are kept. This threshold value for Ihe cluster total can 

25 be varied. 

Redundancy/finishing filters 

Redundant SNP filter: SNPs in different contigs of liie same gene which have the same base 
change and surrounding sequence are flagged as redtmdant. To accommodate possible splice variants 
30 this redimdancy filter also applies to SNPs which have the surrounding sequence matches on only one 
side. 

T cell receptor/immunoglobulin filters 
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Sequences contairung SNPs are filtered to remove SNPs in sequences that are homologs to T 
cell receptors and immunoglobulin genes because both types of genes bave hyper-variable regions 
which could result in false positives. 

Output file 

SNP related data: Wifli each candidate SNP a variety of data is kept, includiag the number and 
sources of all contributing sequences (for example gene album, HIPS, FL, WashU/Merck, etc.), the 
surrounding sequence, measures of the ratio and quality scores for the "best" sequence representing 
each allele, etc. 

Sequence related data: for each sequence associated with each SNP, the following data is kept 
including the distance in each direction to the end of flie sequence, the distance ia each direction to the 
next base different firom the consensus and passing the initial quality filters, the library, tissue ID, 
donor ID and comments (for example tumor, diseases, normal). 

These methods have been described in patent applications entitled ''Metihod for the 
Identification of Sequence Polymorphisms using Polynucleotide Sequence Databases, and Single 
Nucleotide Polymorphisms Identified Thereby" (Attorney Docket Nos. GX-0006 P and GX-OOlO P), 
and are hereby incorporated by reference. 

b. Identification of polymorphisms in osteoarthritis associated genes by SSCP 
The invention provides methods for detecting the presence of polymorphisms in candidate 
genes of the invention. The invention also provides methods for distinguishing polyniorphisms which 
contribute to a particular disease (e.g. osteoarthritis) over polymorphisms which do not contribute to 
the disease. 

1. Identification of Polymorphisms in Candidate Genes 

Identification of polymorphisms in a candidate gene, according to the invention, wiU involve the 
steps of isolating flie candidate gene, determinim; its genomic structore and identifying polymorpHsms 
in the DNA sequences in any portion of the entire protein-coding region. The invention also provides 
methods for identifying polymorphisms in the DNA sequences corresponding to RNA splice junctions. 
The invention also provides methods for identifying polymorphisms ia the DNA sequence 
corresponding to the regulatory (promoter) region of the candidate gene. 
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A candidate gene is isolated by cloning methods well known in the art (described above). 
Preferably the genomic structure of a candidate gene is determined by Soufliem blot analysis, as 
described in Section C. It is expected that the entire sequence of an open reading frame (ORF) of an 
average entire gene can be spanned by 16 PCR-amplified DNA fragments or amplimers of an 
5 average length of 225 bp. It is expected that a smaller gene can be spanned by 1-2 amplimers and that 
>50 amplimers are required to span extremely large genes. Primers useful for production of the 
amplimers of a particular candidate gene are designed based on preexisting knowledge of the 
sequence of the wild type gene, according to the primer design strategies described in Section A 
entitled "Design and Synthesis of Oligonucleotide Primers." 
10 For PGR amplification of a region to be tested by SSCP it is preferable to design primers that 

amplify overlapping regions of the candidate gene. If a sequence variation is located in a region of a 
candidate gene that corresponds to the region to which the primers hybridize, the primers will likely not 
bind, the region containing this sequence variation will not be amplified and the variation will not be 
detected in PGR based assays. By producing overlapping amplimers it is expected that virtually all of 
15 flie sequence variations in a particular candidate gene will be detected. The amount of overlap in the 
amplimers is somewhat variable (approximately 20%) and the precise location of the overlapping 
regions wUl depend on the location of regions cornprising a sequence that is an appropriate primer 
sequence. It is a possibility that a polymorphism wiU be located at a position just adjacent to the primei; 
site. Consequentiy, sequence information will be available for only 20 bp on one side of the 
20 polymorphism and for 104-279 bp on the other side of the polyinorphism. However, this should be a 
sufficient amount of sequence information to allow dejQnition of a unique sequence context in which to 
define the particular polymorphism 

Based on screening analysis of 92 samples (184 chromosomes), it is expected that about 50% 
of the amphmers will demonstrate polymorphisms, and that approximately 80% of these amplimers will 
25 detect changes at single positions while the remaining 20% will detect base changes at two positions. 
Based on these estimates, it is expected that there will be approximately 10 sequence variations per 
open reading frame. However, the number of amplimers that demonstrate polymorphisms witih vary 
depending on the number of individuals tested, the ethnicity and structure of tiie population being 
tested, and the region of DNA being tested. 
30 Preferably, each polymorphism wiU be detected in the context of an SSCP fragment. 

Polymorphism analysis by fluorescent SSCP (fSSCP, described in detail in Section F entitled 
"Identification and Characterization of Polymorphisms") uses PGR to generate an amplimer of DNA 
to be studied. The region to be tested is defined as the region between the primers (e.g. the region that 
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is incorporated iiito the PGR product and reflects the sequence of the DNA sample being tested). The 
PGR primers reflect the sequence of the DNA sample being tested and are incorporated into the PGR 
product as one end of each strand of DNA in Ihe PGR product. If a polymorphism occurs in a primer 
binding site either the PGR primer does not bind due to the mismatch and the PGR will not produce a 
5 product, or the primer binds, an amplification step occurs wherein the primer is incorporated, but the 
amplified product does not contain the polymorphism which occurs at the primer binding site. 
Therefore, fSSCP provides a method of screening a DNA sequence located between PGR primers 
for the presence of polymorphisms. 

The sensitivity of the technique of fSSGP for detecting a polymorphism is affected by length, 
10 such that there is a substantial decrease, in the detection of polymorphisms in amplimers that are 

greater than 300 bp in length. However^ different conditions for performing SGCP at high sensitivity 
with larger fragments, e.g. 800-1500 bphave also been described. If the length of DNA screened per 
amplimer is decreased then more ampUmers are required to screen a region of a given size. Therefore, 
efficient screening of a gene dictates that the lower limit of the size of an amplimer is 125 bp. To 
15 attain specificity for a particular gene sequence, pmners are usually 20-25 bp in length, and additional 
criteria such as G:G content, and intra- and inter-primer complementarity are inoportant considerations 
in primer design (as described above). All of these considerations are addressed if the primerS 
program (Gopyright (c) 1996 Whitehead Institute for Biomedical Research) is employed to design 
pairs of primers suitable for use in a single PGR reaction. Typically, program parameters are set so 
20 that multiple amplimers are designed in the length range of 150-300bp, with predicted primer melting 
temperatures in. the narrow range 60-62'^. The narrow temperature range increases the likelihood 
that a single set of PGR conditions can be used to generate a wide variety of different amplimers. 

If it is desirable to screen a contiguous stretch of DNA which is larger than the maximum 
firagment size desired for sensitive polymorphism detection by fSSGP (300 bp) it is necessary to use 
25 multiple amplimers (which are assayed separately) which span the region of interest. Since the primer 
sites in an amplimer are not tested, these sequences need to be contamed within another amplimer. To 
test file primer sequence, overlappmg amplimeix are designed by an algorithm that evaluates a large 
number of amplimers generated by the primerB program for the optimum overlapping set according to 
a cost function. Thus, a series of overlapping PGR amplification products can be used to test a 
30 contiguous stiretch of DNA. Constiraints on primer design are such that the absolute nunimum overlap 
is rarely possible. As a result, some regions of overlap occur that results in 'double testing' of a 
particular segment of DNA. The detection efficiency is affected by the sequence context of the 
polymorphism; it is possible that a polymorphic site will be detected in only one of two different 
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amplimers which overlap the same site. One strategy that is useful for increasing polymorphism 
detection efficiency is to design overlapping amplimers to generate 2Tfold coverage of all sequences. 
SSCP does not detect 100% of polymorphisms. The invention provides for detection of 

polymorphisms with, an efficiency of 95% under a single set of conditions using single coverage of 
5 sequences; a 2-fold screeniag strategy can be employed if it is necessary to increase this detection 
efficiency. 

It is expected that the polymorphism can be located, and detected anywhere in the SSCP 
fragment except in the regions at each end that correspond to flie sequence of the PGR primers. The 
precise location and identity of the sequence variation(s) of a particular SSCP fragment can be 
10 confirmed by sequencing the fragment as described in Section D entitled 'Isolation of a Wild Type 
Gene". The sequence of a candidate gene will be compared to the known sequence of a wild-type 
version of Ihe gene by using the following DNA/protein sequence analysis programs and methods. 

There are a large number of freely available methods for performing sequence comparisons. 
These methods differ in their speed of execution, their sensitivity, and the type of comparisons they 
15 are able to make. For example one can compare two DNA sequences, two protein sequences, a 
DNA sequence to a protein sequence by conceptual translation, or DNA sequences as if they were 
protein sequences, again by conceptual translation. The. BLAST suite of programs (Altschul et al., 
1990, J.Mol.Biol. 215:403) are commonly used to perform the above-referenced type of analysis. 
Although the BLAST suite of programs provides a rapid method of determining multiple distinct 
20 similaiides between two sequences, these programs are not guaranteed to find an opticoal solution 
when companng two sequences according to a particular set of parameters. PSI-BLAST is a more 
sensitive variant of BLAST that operates by iteratively searching the database wHle simultaneously 
refining the query pattern based on the results of the searches. Other packages of programs that are 
available and which have different specific properties include the HMMER, SAM, WISE, STADEN 
25 and FASTA packages, and Ihe programs est_genome, dotter, e-PCR, Clustal, cross_match and phirap 
(Tearson. 1996. Methods EnzvmoL . 266:227'). 

If sequence information is available for the intron-exon boundaries and for a region of the 
intron (of approximately 30-150 bp) located inmiediately 5' of an intron-exon boundary, primers can be 
designed to produce amplimers useful for identifying polymorphisms located in the RNA splice 
30 junctions. Similarly, if the promoter region of a candidate gene has been sequenced, primers can be 
designed to produce amplimers useful for identifying polymorphisms located in the promoter region. 
Additional methods for detecting and isolating poljmaiorphisms include, but are not limited to fluorescent 
polarization-TDI, mass spectroscopy denaturing gradient gel electrophoresis, chemical cleavage of 
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mismatch, constant denaturant capillary electrophoresis, RNase cleavage, heteroduplex analysis, 
sequencing by hybridization, DNA sequencing, representational difference analysis, and denaturing 
higih performance liquid chromatography, described below in Section F entitled, 'Identification and 
Characterization of Polymorphisms" . 
5 ■ ■ 

2. Methods of Determining if a Polymorphism Contributes to osteoarthritis 
No two indiAdduals (excluding identical twins or other clones) have the same sequence of 
DNA in their genome. Variability in gene sequences between individuals accounts for many of the 
obvious phenotypic differences (such as pigmentation of hair, sldn, etc.) and many nonobvious ones 
10 (such as drag tolerance and disease susceptibility). In a population, the DNA sequence that occurs at 
the highest frequency at any given site is commonly referred to as the wild type sequence. The term 
"wild type sequence" can be misleading, however, because in different populations an alternative form 
of a DNA sequence may be predominant and thus considered wild type for that particular population. 
DNA polymorphisms are located throughout the genome, within and between genes, and the various 
15 forms may or may not result in differential gene function (as determined by comparing the function of 
two alternative forms of the same sequence). Most polymorphisms do not alter gene function and are 
called neutral polymorphisms. Some polymorphisms do have an effect on gene function, for example ... 
by changing the amino acid sequence of a protein, or by altering control sequences such as promoters 
or RNA splicing or degradation signals. 
20 Polymorphisms can be used in genetic studies to identify a gene involved in a disease. If a 

polymorphism alters a gene function such that it increases disease susceptibility, then it will be present 
more often in individuals with the disease than in those without the disease. Alternatively, if a 
particular DNA variant is protective against a disease, it will be found more often in mdividuals without 
flie disease than in those with the disease. Statistical methods are used to evaluate polymorphism 
25 frequencies found in diseased as compared to normal populations, and provide a means for establishing 
a causal Imk between a polymorphism and a phenotype. To detect a significant association between a 
disease and a polymorphic site, different tests may be used with either genotypic or aUehc distributions. 
The simplest test consists of a t-test wherein the frequency of the polymorphic alleles in normal 
mdividuals and individuals with the disease phenotype is compared. A comparison of the genotypic 
30 distribution in normal radividuak and mdividuals with the disease phenotype can also be performed 
using a chi-square test of homogeneity. These tests are implemented in all commercially or fireely 
available statistical packages, for example SAS and S+, and are even included in Microsoft Excel. 
More sophisticated analyses will be performed by incorporating covariates such as linear regression or 
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logistic regression, and by accounting for the information provided by adjacent polymorphic sites 
(multipoint analysis). An example of this type of program is the freely available program "Analyze" by 
JD Terwilliger (currently available at the WWW site ftp://ftp.well.ox.ac.uk/pub/genetics/analyze). 
If a polymorphism has a phenotypic effect, a bias will exist in the distribution of 
5 polymorphisms between groups that have and do not have the disease phenotype. This manner of 
analysis can be used to study a trait that is not necessarily a disease; any trait can be studied by 
comparing a group with a particular phenotypic form of a trait to a group with a different phenotypic 
form of that trait. It is important that the cases and controls are correctly matched with regards to 
ethnicity, environmental influences, and otiher factors which could effect the phenotype being studied. 
10 Studies which test polymorphism frequencies within groups exhibiting different pheno^es and use 
statistical methods to compare the group polymorphism frequencies and identrfy correlations with 
phenotypes, are known as "associations studies". 

Some polymorphisms Ihat occur in a single gene can alter the function of a gene sufficiently 
such that the polymorphism results in a disease (monogenic disease). However, many common human 
15 diseases are polygenic; that is they are the result of complex interactions of various forms of multiple 
genes. In the case of polygenic diseases, the alteration of a single gene may not be detrimental per se, 
but in combination with certain sequence variants of other genes, this altered DNA sequence may 
contribute to a disease phenotype. DNA variants leading to monogenic diseases are usually rare in a 
population due to the process of natural selection against those carrying the disease gene. As variants 
20 in genes that are involved in polygenic disease do not produce the disease phenotype tinless they occur 
in the appropriate combination with other gene variants, normal individuals can carry a subset of the 
disease-contributing variants without suffering adverse effects. Thus, disease-contributing gene 
variants that are associated with polygenic diseases may exist at a high frequency in a normal 
population. Selection against these disease variant forms of a gene wiE only occur when they are 
25 present in the appropriate disease-causing combiaation and there may not necessarily be selection 

against these gene variants in individuals carrying a subset of the disease-contributing variants. Neutral 
DNA variants do not alter gene function or contribute to a disease, are under no selective pressure 
and occur at variable frequencies within populations. 

Monogenic diseases tend to be rare within the population, and therefore few patients may be 
30 available for studies of these diseases. A polymorphism in a single specific gene is necessary and 
usually sufficient to cause a monogenic disease, such that associations between the variant gene and 
the phenotype are usually leadUy apparent. In cases where the expression of a mutation phenotype is 
complete, ("complete penetrance"), the polymorphism present in the disease gene will not be found 
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upon examination of a large number of normal individuals. If there is not complete penetrance then 
some apparently normal individuals will contain the mutation; the difference in frequency of 
occurrence of Ihe variant gene in the disease group as compared to the normial population wiU reveal 
that the variant is associated with the disease. 
5 In polygenic diseases, variation at different genes occurs in a combination which alters 

susceptibility to the disease. Although several genes may have variant forms which can contribute to a 
disease phenotype, it is not always necessary for a contributing variant to be present at every gene 
potentially contributing to the disease in a given affected individuaL For example, a hypoflietical 
disease could be caused by a particular combination of variants at three of four genes, designated as 
10 A, B, C, and D. Appropriate susceptibility variants in combination at any three of the genes can cause 
the susceptibility, i.e. one person with increased susceptibility may have susceptibility variants in genes 
A, B, and C, wHle another individual with increased susceptibility to the same disease wiU have 
susceptibility variants in genes B, C, and D. Therefore, although not aU affected individuals wiU have 
the same susceptibility variants, the net result is that a diseased popufetion will have susceptibihty 
15 variant forms of genes A, B, C, and D iat a higher frequency than an unaffected population (as 
detected by association studies). 

Unlike monogenic diseases which result from polymorphisms that are not present in control 
populations^ the polymorphisms which contribute to flie polygenic disease are also present in a normal 
population. As described in the example above, an individual with susceptibility polymorphisms in only 
20 one or two of the genes potentially contributing to the disease susceptibility wiU be normal with regard 
to disease susceptibility. Therefore, normal populations can be used to identify polymorphic regions of 
the genome in the population, and these regions can then be specificafly tested in larger patient and 
control populations. Typically, a gene is analyzed for the presence of polymorphisms by testing 
between 2 and 100 normal individuals m order to establish if a particular polymorphism is present fox 
25 that gene in the population. Once a polymorphic site(s) has been definedi the polymorphic site is then 
tested in case (disease) and control (normal) populations and statistical analyses are performed to 
identify polymorphisms which occur at significantly different frequencies in the two populations. 

The determination of the statistical significance of polymorphism frequency differences is 
dependent upon the size of the observed frequency difference between the populations, and on the . 
30 size of the populations bemg studied. If a significant difference is found, then it can be concluded that 
an association exists between the polymorphism and the phenotype being studied. A statistically 
significant difference is a frequency difference at a particular site between populations which would 
be expected to occur by chance in only 5 out of 100 tests. That is, a difference which has a 95% 
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probability of being a true difference due to the affect of the gene. 

The foregoing discussion describes a method of testing for an association between a 
polymorphism which is the direct contributor to a disease and the disease phenotype. However, 
polymorpliisms which do not directly contribute to a disease can also be used to identify regions of the 

5 genome which contain genes that contribute to the disease by virtue of their proximity to disease- 
contributing polymorphisms. 

In humans, DNA exists as 23 homologous pairs of linear molecules (chromosomes). 
Recombination is a process which results in reciprocal exchanges of short homologous DNA 
segments between these homologous DNA pairs. Only one of each of the 23 pairs of chromosomes i& 

10 inherited by the offspring. The inherited chromosome is thus made up of tandemly arrayed segments 
of DNA derived from both of a pair of chromosomes. Consequently, DNA is transferred in segments 
from one generation to the next. Although the boundaries ,of each inherited segment may vary in each 
generation, the net effect is that sequences of DNA which are adjacent along the length of the 
molecule are inherited together at a higher frequency flian sequences that are farther apart. If a region 

15 (continuous linear segment) of DNA has two or more polymorphisms that are close together, they will 
be co-inherited at a higher frequency liian polymorphisms that are farther apart, as they are more 
likely to remain on the same segment of DNA during recombiaation. Therefore, if two or more 
polymorphisms are close together, they wUl occur together at a higher frequency in a population than- 
would be expected by random segregation. This effect is known as linkage. Linkage studies are 

20 performed using multiply affected individuals within families; the most commonly used approach is to 
test markers located ^oughout the genome in many sets of affected sib pairs that share the same 
phenotype. Markers which are located in the region of a genome that contributes to the phenotype will 
be inherited in both siblings, along with the phenotype, at a higher frequency than expected by chance. 
Studies wherein data from many such families is compared can be used to implicate a region of a 

25 genome as one that contributes to a particular phenotype. 

linkage disequilibrium (LD) association studies provide another mefliod for using 
polymorphisms in genetic studies. The method of UD involveis making a correlation at the population 
level, between the alleles (alternative polymorphic forms of the same sequence site) present at 
different genomic sites. If site 1 has two variant forms, A and a, and site 2 has two variant forms B 

30 and b, the observation in a population that allele A at site 1 is more often found with allele B at locus 2 
than with allele b is an example of LD. If allele B is a disease- contributing polymorphism, then testing 
at allele A may show an association with the disease. 

Liukage disequilibrium may be generated in several ways. Maintenance of LD in a population 
1 ■ • ' ■ 
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allows a disease assocdation to be detected many generations after the formation of LD. The 
maintenance of LD is explained by linkage: the closer the two loci, the longer (in terms of number of 
generations) that particular LD is maintained. As a result, polymorphisms which do not directly 
contribute to a disease can be used to identify regions of the genome which contain a disease 
contributing polymorphism. If a polymorphism affects gene function such that it contributes to a 
phenotype being studied and is found to be associated with the phenotype, nearby (neutral) 
polymorphisms which are in LD with the disease polymorphism may also show an association with the 
disease. Conversely, if a polymorphism does not affect gene function but is found to be associated 
with aparticular phenotype, this polymorphism is in LD with a different, but adjacent polymorphism 
that affects gene function such that it contributes to the phenotype being studied. If a neutral 
polymorphism is always inherited with a phenotype- contributing polymorphism, then the strength of 
the association of the neutral polymorphism to the phenotype will be equal to that of the polymorphism 
which affects gene function and is contributing to the phenotype. A polymorphism which shows an 
association with a phenotype (for instance with disease susceptibility) is a marker for that phenotype 
and implicates the region in which the polymorphism resides as a region containing a polymorphism 
which contributes to the phenotype. Additional flanking polymorphisms can be tested to determme the 
preciselocationof the tmephenotype^contributing variant. ■ • 

linkage studies oh families, and LD stadies on populations have different degrees of 
resolution with regards to defining the size of a DNA region which contams the phenotype- 
contributing polymorphism, hi general, linkagb stadies define an interval which potentially contains tens 
to hundreds of genes, while LD stadies have been used to impHcate single genes in the development of 
a particular phenotype. 

3. Test Populations Useful for Polymorphism Genotypiug 

The mvention provides methods of determinmg allelic firequencies by performing genotypic 
analyses ia appropriate test populations. 
Study cohorts: 

nsteoarthritis Progression Cohort 
) Derived firomapopulationofnormalwomen aged 45-65. The origmal aims of the study, started 

in 1989, were to assess how many women around menopausal age would get arthritis and what factors 
predispose themto developing it. Also to lookinto factors thatmaybe associated with progression of the 
disease. 
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A series of examinations, x-rays and questiomiaires about lifestyle factors were carried out on 
1003 women that were recruited to the study. This study has been going for 10 years. As a result, a 
unique, world-renowned and wellrespected study is available looking at flie reasons why women develop 
osteoarthritis, potential risk factors and the genetics of the disease. 

Prospective Severe Outcomes Cohort ("case-control) 

Five hundred joint replacement cases will be ascertained as will be age, ethnicity and gender 
matched controls. The clinical data envisaged are^ : HRT use, numbers of joints affected, occupation, 
injury history, age, BMI. 



The list of studies relevant is shown in following table. 



Population detatlh 



Reasonable objectiyes Timing 



Biomarker 
study 

Progression 
20 hand & knee 
OA study 



100 progressors + 75 non-progicssors, 

100 normals, all female from the 

progression cohort Detailed clinical 

data, 10 yr. follow-up: joint-space 

narrowing/yr., joints affected, BMD, 

fractures, CRP levels. 

800 women from progression cohort, 

DNA, serum, urine^ 5 biomarkers 

-800 Women from progression cohort. 
Detailed clinical data, joint-space 
nattowing/yr., joints affected, BMD 
dap and spine), fractures, CRP levels, 
^ full lipid measurements, incidence of 
fractures (assessed by X-rays), 10 yr, 
follow-up radiographs for all patients. 



-500 cases (joint replacements) Vs 500 
matched Qontrols. Prospective study: 
DNA + 2 biomarkers. Clinical datg 
i:eqylredi gteroid ns6, #iolQts trffeoted: 



Large genetic effects for 6-8 JM onth 
fast OA progression, proof 
of principle. Correlation 
with biomarkers. Possible 
novel target 

Correlation of genetics with 12 Months 
biomarkers - v, useful for 
clioical trials. 

Genetic effects of OA. 18 Months 

progression. Risk of OA. 
Corfelation with biomarkers. 
Possible novel target. 
Genetic effects of 
osteoporosis risk, correlation 
with BMD. 

Possibly genetic effects of 

lipid levels and CYD risk. . - ' ' . 
Large genetic effects for -6-12 

OAri$k, proof of pxinciple. months for 

Possible novel target collection 
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occiipatiQBy injury Mstory, age, BMI. 

25 4. Assays Useful for Determining the Association of a Polymorphism wiHi osteoarthritis 

Clinical parameters 

There is a general consensus that radiological changes are Ihe preferred method for 
epidemiological studies on the basis of cross sectional and prospective correlations between severity of 

30 X-ray changes with the presence of pain and loss of function. la osteoarthritis, the loss of cartilage 
produces a narrowed space between bones. The pattern of joint space narrowing can help distinguish 
between osteoarthritis and rheumatoid arthritis. Bone spurs (osteophytes) also help diagnose 
osteoarthritis. Other relevant clinical end points are paia, disability, function, joint replacement and 
maintenance of joint structure. Stages of disease progression are as follows: 

35 . 

Early stage: focal swelling of articular cartilage followed by the appearance of irregularities in the 
surface. 

Intermediate stage: progressive degradation and loss of articular cartilage. Also characterised by 
40 jfibrillation (vertical splitting), detachment (horizontal splitting) and thinning of the cartilage. 

Late stage: Articular cartilage is almost completely destroyed. Bony outgrowths (osteophytes) occur 
at the joint margins resulting in residual arthritis. Characterised by pain and limitation of jomt 
movement. 

45 

minical measurements of OA 

Quantitative traits of interest for the study of OA and its progression are: 

Osteophyte count 
Joint space narrowing (mm/yr.) 
Number of joints affected 
Types of joints affected ' 

50 
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In addition a series of biocheniical markers can provide valuable information such as: 

! ■ 

COMP 
- CRP 
5 - HA 

ProtocoUagen Type n 

Bone resorption markers (e.g. collagen cross-links) 
Conf onnding factors 

10 Most currently recognised environmental risk factors for prevalent knee OA - obesity, knee ^ 

injury, and physical activity, influence incidence more than radiographic progression. Furthermore, 
these factors might selectively influence osteophyte formation more than joint space narrowing. These 
findings are consistent with knee OA being initiated by joint injmy, but with progression being a 
consequence of impaired intrinsic repair capacity. 

15 Other known confounding factors are steroid (glucocorticoid) use and, in women, hormone 

replacement therapy. Glucocorticoids ameliorate erosion in animal OA models and suppress synthesis , 
of matrix metalloproteinases (Saito et al. 1999). Estrogen replacement therapy, on the other hand, has - 
been shown to have a moderate, but not statistically significant, protective effect against worsening of 
. OA both in the Chingford (Hart et al. 1999) and Framingham (Zhang et al. 199 8) studies. 

20 

5. Methods of Genotyping Polymorphisms 

The invention discloses methods for performing polymorphism genotyping. These methods can 
be used to detect the presence of a polymorphism in a sample comprising DNA or RNA. 

A DNA sample for analysis according to the invention may be prepared from any tissue or 
25 cell line, and preparative procedures are well-known in the art. The preparation of genomic DNA is 
performed as described in Section B. 

RNA samples may also be useful for genotyping according to the invention. Isolation of RNA 
can be performed according to the following methods. 

RNA is purified from mammalian tissue according to flie following method. Following removal 
30 of the tissue of interest, pieces of tissue of ^2g are cut and quick frozen in liquid nitrogen, to prevent 
degradation of RNA. Upon the addition of a volume of 20 ml tissue guanidinium solution per 2 g of 
tissue, tissue samples are ground in a tissuemizer with two or three 10-second bursts. To prepare 
tissue guanidiium solution (1 L) 590.8 g guanidinium isothiocyanate is dissolved in approximately 400 
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ml DEPC-treated H^O. 25 ml of 2 M Tris-Cl, pH 7.5 (0.05 M final) and 20 ml Na^EDTA (0.01 M 
final) is added, the solution is stirred overnight, the volume is adjusted to 950 ml, and 50 ml 2-ME is 
added. 

Homogenized tissue samples are subjected to centrifugation for 10 min at 12,000 x g at 12°C. 

5 The resulting supernatant is incubated for 2 min at 65°C in the presence of 0. 1 volume of 20% 

Sarkosyl, layered over 9 ml of a 5.7M CsCl solution (O.lg CsCVml), and separated by centrifagation 
overnight at 1 13 ,000 x g at 22°C. After careful removal of the supernatant, the tube is inverted and 
drained. The bottom of the tube (containing the RNA pellet) is placed in a 50 ml plastic tube and 
incubated overnight (or longer) at 4°C in the presence of 3 ml tissue resuspension buffer (5 mM 

10 EDTA, 0.5% (v/v) Sarkosyl, 5% (v/v) 2-ME) to allow complete resuspension of the RNA pellet. The 
resulting RNA solution is extracted sequentially with 25:24:1 phenol/chloroform/isoamyl alcohol, 
followed by 24:1 chloroform/isoamyl alcohol, precipitated by the addition of 3 M sodium acetate, pH 
5.2, and 2.5 volumes of 100% ethanol, and resuspended in DEPC water (C3iirgwih et aL, 1979, 
Biochemistry, 18: 5294). 

15 Alternatively, RNA is isolated from mamanalian tissue according to the following single step 

. protocol. The tissue of interest is prepared by homogenization in a glass teflon homogenizer in 1 ml 
denaturing solution (4M guanidiium thiosulfate, 25 mM sodium citrate, pH 7.0, 0. 1 M 2-ME, 0.5% 
(w/v) N-laurykarkosine) per lOOmg tissue. Following.transfer of the homogenate to a 5-ml 
polypropylene tube, 0.1 ml of 2 M sodium acetate, pH 4, 1 ml water-saturated phenol, and 0.2 ml of 
20 49:1 chloroform^oamyl alcohol are added sequentially. The sample is mixed after the addition of 
each component, and incubated for 15 min at 0-4°C after all components have been added. The 
sample is separated by centrifugation for 20 min at 10,000 x g, 40C, precipitated by the addition of 1 ml 
of 100% isopropanol, incubated for 30 minutes at -20°C and peUeted by centrifugation for 10 minutes 
at 10,000 X g, 4°C. The resulting RNA peUet is dissolved in 0.3 ml denaturing solution, transferred to a 
25 microfuge tube, precipitated by the addition of 0.3 ml of 100% isopropanol for 30 minutes at -20°C, 
and centiafuged for 10 minutes at 10,000 x g at 4°C. The RNA pellet is washed in 70% ethanol, dried, 
and resuspended in 100-200 ml DEPC-treated water or DEPC-treated 0.5% SDS (Chomczynski and 
Sacchi, 1987, Anal. Biochem., 162: 156). 

RNA prepared according to either of these methods can be used for genotyping by the 
30 methods of Northern blot analysis, SI nuclease analysis and primer extension analysis (Ausubel et aL, 
supra). 

cDNA samples also may be prepared according to the invention, i.e. , DNA that is 
complementary to RNA such as mRNA. The preparation of cDNA is well-known and well- 
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documented in the prior art. 

cDNA is prepared according to the following method. Total cellular RNA is isolated (as 
described) and passed tlirough a column of oligo(dT)-cellulose to isolate polyA RNA. The bound 
polyA mRNAs are eluted from the column with a low ionic strength buffer. To produce cDNA 
5 molecules, short deoxythymidine oligonucleotides (12-20 nucleotides) are hybridized to the polyA tails 
to be used as primers for reverse transcriptase, an enzyme that uses RNA as a template for DNA 
synthesis. Alternatively, mRNA species can be primed from many positions by using short 
oligonucleotide fragments comprising numerous sequences complementary to the mRNA of interest as 
primers for cDNA synthesis. The resultant RNA-DNA hybrid can be converted to a double stranded 
10 DNA molecule by a variety of enzymatic steps well-known in the art (Watson et al., 1992, 
Recombinant DNA, 2nd edition. Scientific American Books, New York). 

Tissues or fluids which are useful for obtaining a DNA or RNA sample according to the 
invention include but are not limited to plasma, serum, spinal fluid, lymph fluid, external secretions of 
the skin, respiratory, intestinal and genitoruinary tracts, saliva, blood cells, tumors, organs, tissue and 
15 samples of in vir?-o cell cultare constituents. 

Genotyping methods which are useful accordiag to the invention, i.e. , for the detection of 
polymorpMsnis in nucleic acid samples isolated from individuals, are disclosed below. 

- Single Strand Conformation Polymorphism (SSCP) 5>p.rftftTiiT> p and Fluorescent SSCP Screening 
20 rfSSCP^ 

SSCP Analysis 

One technique for detecting DNA sequence variations in a biological sample is single strand 
conformation polymorphism (SSCP) (Glavac et al., 1993, Hum. Mut. 2:404; Sheffield et al., 1993, 

25 Genomics 16:325). SSCP is a shnple and effective technique for the detection of single base changes. 
This technique is based on the principle that sin^e-stranded DNA molecules assume specific 
sequence-based secondary structures (confonners) nnder nondenaturing conditions. The detection of 
point mutations by single stranded conformation polymorphism is believed to be due to an alteration in 
the structure of single stranded DNA. Molecules differing by only a single base substitution may 

30 assume different conformers and migrate differently in a nondenaturing polyacrylamide gel Single 
stranded DNAs that contain sequence variations are identified by an abnormal mobility on 
polyacrylamide gels. SSCP detects all types of point mutations and short insertions or deletions that 
are located between the PCR primers (within the probe region) with apparently equal efficiency. This 
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technique has proven useful for detection of multiple mutations and polymorphisms, including SNPs. 
SSCP sensitivity varies dramatically witii the size of the DNA fragment being analyzed. The optimal 
size fragment for sensitive detection by SSCP is approximately 125-300bp. 

The mobility of a single stranded DNA or double stranded DNA fragment during 
5 electrophoresis through a gel matrix is dependent on its size. Small molecules migrate more rapidly 
than large molecules because they pass through tlie pores in the matrix more easily. Conventionally, 
electrophoresis of single stranded DNA involves a 'denaturing' gel which maintains the single 
strandedness of the molecules. The denaturant is typically urea in polyacrylamide gels, and typically 
foimamide or sodium hydroxide in agarose gels. In contrast, according to the SSCP screening 

10 protocol, single-stranded DNA is analyzed on a 'nondenatuiing' gel. When single stranded DNA is 
analyzed on a 'non-denaturing' gel, intramolecular interactions can occur. In particular, the suogle 
stranded DNA is able to (partially) bind to itself. Consequently, DNA ttiat is separated by 
electrophoresis on an SSCP gel does not migrate as a linear molecule but rather, the mobility of the 
DNA on an SSCP gel is governed by both its size and tertiary structure (conformation). The tertiary 

15 structure of a single stranded DNA fragment is dependent on the sequence of the entire fragment. 

Hierefore, if a polymorphism exists in a given fragment, the conformation wiH usually be altered. The 
technique is performed as follows. . . 

One or more test DNA samples are prepared for analysis as described above, and subject to • 
PCR amplification. Oligonucleotide primers are designed and sjoithesized as described above. 

20 Amplifications are performed in a total volume of 10 ml containing 50 noM KCl, 10 mM Tris-HCl, pH 
9 .0 (at 25°C), 0. 1 % Triton X-100, 1.5 mM MgClz. 0.2mM of dGTP, dATP, dTTP, 0.02 mM of non 
radioactive dCTP, 0.05 ml [a-^^P] dCTP (1,000-3,000 Ci mmol ^ 10 mCi ml'*), 0.2 uM each primer, 50 
ng genomic DNA (or 1 ng of cloned DNA template) and 0. 1 U Taq DNA polymerase. The PCR 
cycling profile is as follows : preheating to 94°C for 3 coin followed by 94°C, 1 min; annealing 

25 temperature, 30 sec; 72°C, 45 sec for 35 cycles and a final extension at 72°C for 5 min. Annealing 
temperature is different for each PCR primer pair and can be optimized according to the parameters 
described above. Amplifications using Vent Taq polymerase CNew England Biolabs) are performed in 
a total volume of 10 ul using the buffer provided by the manufacturer wifli 1 mM each of dGTP, 
dATP, dTTP, 0.02 mM dCTP, 0.25 ul [a-^^pj ^cTp (1,000-3,000 Ci mmol'MO mCi ml'^, 0.2 uM of 

30 each primer, 50 ng of genomic DNA (or 1 ng of cloned DNA template) and 0. 1 U of Vent Taq DNA 
polymerase. Samples are heated to 98°C for 5 min prior to addition of enzyme and nucleotides. The 
PCR cycling profile is 98*'C, 1 Tnin; annealing temperature, 45 sec; 72''C, 1 min for 35 cycles, follbwed 
by a final extension at 72°C for 5 min. The length and temperature of each step of a PCR cycle, as 
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• well as the number of cycles, is adjusted in accordance to the stringency requirements, as described 
above. 

SSCP analysis is performed as follows. Ten ul of formamide dye (95% formamide, 20mM 
EDTA, 0.05% bromophenolblue, 0.05% xylene cyanol) are added to 10 ul aliquots of radiolabeled 

5 PGR product. Following denaturation at 100°C for 5 min, the reaction mixture is placed on ice. Two ul 
aliquots are loaded onto 8% acrylamide:bisacrylamide (37.5:1), 0.5X TBE (45 inM Tris-borate, 1 mM 
EDTA), 5% glycerol gels. Electrophoresis is carried out at 25 W at 4°C for 8 hours in 0.5X TBE. 
Dried gels are exposed to X-OMAT ARfJm (Kodak) and the autoradiographs are analyzed and 
scored for aberrant migration of bands (band shifts). SSCP may be optimized, as desired, as taught in 

10 Glavacetal., 1993, Hum. Mut. 2:404. 

fSSCP Analysis 

Techniques for screening multiple DNA samples simultaneously are also useful for performing 
rapid genotyping analysis on a large number of samples according to the inventioiL By pooling and 
15 multiplexing DNA samples in fluorescent SSCP (fSSCP) assays, the high throu^put required for 
detecting sequence variations in a large number of samples is achieved (Makiao et al., 1992, PCR 
Methods Appl. 2:10; Ellison et al., 1993, BioTechniques 15:684). According to the method of fSCCP, 
PCR products are visualized and analyzed usiag an ABI fluorescent DNA sequencing machine. 
Different primer pairs are identijaed by different color fhiorochromes (4 different fluorochromes are 
20 now available). fSSCP offers the following advantages over SSCP. Unlike SSCP, fSSCP does not 

require handling of radioactive materials. Furthermore, the fSSCP tecimique allows for autonaated data 
and automated data analysis programs that detect aberrantly migrating samples. In contrast, SSCP 
evaluation involves visual examination by an individual, and does not provide a means for correcting 
for lane to lane variations in electrophoretic conditions, as does fSSCP analysis. 
25 fSSCP Analysis is performed as follows. 

Amplifications are performed in a total volume of 10 ul containing 50 mM KCl, lOmM Tiis- 
HCl, pH 9.0 (at 25 *C), 0.1 % Triton X-lOO, 1.5 mM MgCl^, 0.2mM of dGTP, dATP, dTTP, dCTP, 
0.2 uM primer labeled with one of the fluorochromes HEX, FAM, TET or JOE, 50 ng genomic DNA 
(or 1 ng of cloned DNA template) and 0. 1 U Taq DNA polymerase. The PCR cycling profile is as 
30 follows : preheating to 94°C for 3 min followed by 94°C, 1 min; annealing temperature, 30 sec; 72°C, 
45 sec for 35 cycles and a final extension at 72'C for 5 min. Annealing temperature is different for 
each PCR primer pair. Amplifications using Vent Taq polymerase (New England Biolabs) are 
performed in a total volume of 10 ul using tiie buffer provided by the manufacturer with 1 mM each of 
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dGTP, dATP, dTTP, dCTP, 0.2 tiM primer labeled with one of the fluorochromes HEX, FAM, TET or 
JOE, 50 ng genomic DNA (or 1 ng of cloned DNA template) and 0. 1 U of Vent Taq DNA 
polymerase. Samples are heated to 98°C for 5 min prior to addition of enzyme and nucleotides. The 
PGR cycling profile is 98°C, 1 min; annealing temperature, 45 sec; 72<'C, 1 min for 35 cycles, followed 
5 by a final extension at 72°C for 5 min. Annealing temperature is different for each PGR primer pan:. 
Two ul of fluorescent PGR products are added to 3 ul formamide dye (95% formamide, 20mM 
EDTA, 0.05% bromophenol blue, 0.05% xylene cyanol), denatured at 100°C for 5 min, then placed on 
ice. Thereafter, 0.5-1 ml of Genescan™ 1500 size markers are added as an internal standard. Two ul 
of the mix is loaded onto 8% or 10®^ acrylamide:bisacrylamide (37.5:1), 0.5XTBE (45 mM Tris- 
10 borate, 1 mM EDTA), 5% glycerol r .md electrophoresis is performed on an ABI 377 DNA 

sequencing machine. Gel temperati , lin^ between 4° and 10°G by an extemal cooling unit 

connected to liie internal cooling plui' xad chambers. Electrophoresis is carried out at 2500-3500 
volts for 4-10 hours in 0.5X TBE. Data is automatically collected and analyzed with Genescan and 
Genotype analysis software (ABI). 
15 The fSSCP procedure identifies regions of 150-300 base mirs containing a sequence 

variation. To identify the exact sequence change, the firagment which demonstrates the aberrant 
migration is amplified again from the same biological sample, using non fluorescent prinaers. The *. 
sequence is then determined usmg standard DNA sequencing methods well known to those skilled in 
the art (Ausubel et aL, supra). 
20 Although SSCP and fSSCP techniques are preferred according to the inveiition, other 

methods for detecting sequence variations, including DNA sequencing, can be employed. Additional 
techniques for detecting DNA sequence variations usefiil according to the invention are described 
below. 

25 Fluorescence Polarization-TDI 

Fluorescence polarization-TDI is another preferred technique tecbnique according to the 
invention for Hie detection of sequence variations. Template-directed primer extension is a dideoxy 
chain terminating DNA sequencing protocol designed to ascertain the nature of the one base 
immediately 3' to the sequencing primer that is annealed to the tai get DNA inomediately upstream 

30 firom the polymorphic site. In the presence of DNA polymerase and the appropriate 

dideoxyribonucleoside triphosphate (ddNTP), the primer is extended specifically by one base as 
dictated by the target DNA sequence at the polymorphic site. By determining which ddNTP. is 
incorporated, the alleles present in the target DNA can be determined. 
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Flnorescence polarization is based on llie observation that when a fluorescent molecu^^ 
exited by plane-polarized Hght, it emits polarized fluorescent light into a fixed plane if the molecules 
remain stationary between excitation and emission. However, because the molecule rotates and 
tumbles in solution, fluorescence polarization is not observed ftiUy by an external detector. The 
5 fluorescence polarization of a molecule is proportional to the molecule's rotational- relaxation time, 
which is related to the viscosity of the solvent, absolute temperature, molecular volume, and the gas 
constant. If the viscosity and temperature are held constant, then fluorescence polarization is directly 
proportional to the molecular volume, which is directly proportional to the molecular weight. If the 
fluorescent molecule is large (withHgh molecular wei^), it rotates and tiimbles more slowly in 
10 solution and flourescence polarization is preserved. If the molecule is small (with low molecular 
weight), it rotates and tumbles faster and fluorescence polarization is largely lost (depolarized). 

In the FP-TDI assay, the sequencing primer is an unmodified primer wih its 3' end 
inunediately upstream from a polymorphic or mutation site. "When mcubated in the presence of 
ddNTPs labled with different fluorophores, the allele-specific dye ddNTP is mcorporated onto the TDI 
15 primer m the presence of DNA polymerase and target DNA. The genotype of the target DNA 

molecule can be determined simply by exciting the fluorescent dye in the reaction and determioing 
■ whether a change in fluorescence polarization occurs.- Chen et al., 1999, Genome Res., 9:492. 

One or more test DNA samples are prepared for analysis as described above, and subject to 
PGR amplification. Oligonucleotide primers are designed and synthesized as described above. 
20 Amplifications are performed in a total volume of 10 ml containing 50 mM KCl, 10 mM Tris-HCl, pH 
9.0 (at 25"C). 0.1 % Triton X-100. 1.5 mM MgCl^, 0.2mM of dOTP, dATP, dTTP, 0.02 mM of non 
radioactive dGTP, 0.05 ml [a-^^P] dCTP (1,000-3,000 Ci mcaol H 10 mGi ml"^), 0.2 uM each primer, 50 
ng genomic DNA (or 1 ng of cloned DNA template) and 0.1 U Taq DNA polymerase. The PGR 
cycling profile is as follows : preheating to 94»C for 3 min followed by 94°G, 1 min; annealing 
25 temperature, 30 sec; 72°C, 45 sec for 35 cycles and a final extension at 72°G for 5 min. Annealing 
temperature is different for each PGR primer pair and can be optinoized according to the parameters 
described above. Amplifications using Vent Taq polymerase (New England Biolabs) are performed in 
a total volume of 10 ul using the buffer provided by the manufacturer with 1 mM each of dGTP, 
dATP, dTTP, 0.02 mM dCTP, 0.25 ul [a-^^P] dGTP (1,000-3,000 Gi mmolMO mCi ml-^), 0.2 uM of 
30 each primer, 50 ng of genomic DNA (or 1 ng of cloned DNA template) and 0. 1 U of Vent Taq DNA 
polymerase. Samples are heated to 98°G for 5 min prior to addition of enzyme and nucleotides. The 
PGR cycling profile is 98*<:, 1 min; annealing temperature, 45 sec; 72°G, 1 min for 35 cycles, followed 
by a final extension at 72°C for 5 min. The length and temperature of each step of a PGR cycle, as 
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well as the nuniber of cycles, is adjusted in accordance to the stringency requirements, as described 
above. 

Following PGR ampMcation, unused PGR primers and dNTPs are destroyed by adding 2ml of 
PGR product to 2ml of S AP/Exonuclease cocktail (0. lU shimp alkaline phosphatase (1 
U/ml,Amersliam Pharmacia Biotech, Inc. , Piscataway, NJ)and 0.2U E. coli exonuclease 1(10 U/ml, 
Amersham)in SAP buffer (20mM TrisHGl, pH 8.0; 10 mM MgGl^, Aniersham))per well of a 384-well 
Black PGR plate (ABT). The mixtures are incubated at 3>TC for 60 min before the enzymes are heat 
inactivated at 95°G for 15 min. The mixture is held at 4"G'until used in the FP-TDI assay. 

To the enzymatically treated PGR product, 2 ml of TDI reaction cocktail containing TDI 
buffer (50mM Tris-HGl (pH 9.0), 50mM KGl, 5 mM NaGl, 2 mM MgCl^, 8% glycerol), 1 mM TDI 
primer, 12.5 nM of each of two aUele specific dye-labled ddNTPs (ROX-ddGTP, BFL-ddATP, 
Tamra-ddGTP, or R6G-ddUrP; NEN Life Science Products, Inc., Boston, MA), and 0.32U TLieniio 
Sequenase (Amersham). The reaction mixtures are incubated at 94oC for 15 mm, followed by 34 
cycles of 94''G for 30 sectonds and SS^G for 15 seconds. Upon completion of the reaction cycles, the 
samples are held at 4°G. 

After the primer extension reaction, 24 ml of TE buffer/methanol (2:1) is added to each 
sample well, and the fluorescence polarization is measured usmg a UL Analyst (LJL Biosystems, 
Sunnyvale, GA). 

I Denaturing Gradient Gel Electrophoresis 

Denaturing gradient gel electrophoresis (DOGE) is a gel system which allows electrophoretic 
separation of DNA fragments differing in sequence by a single base pair. The separation is based 
upon differences m the temperature of strand dissociation of the wild-type and mutant molecules. 
Duiiag electrophoresis, fragments migrating through the gel are exposed to an increasing 

5 concentration of denaturant in fliis gel. When the DNA fragments are exposed to a critical level of 
denaturant, the DNA strands begin to dissociate. This dissociation cjanses a significant reduction hi the 
mobility of the fragment. The position in the gel at which the level of denaturant is critical for a 
particular DNA fragment is a function of the Tm of tiie DNA fragment and is tiierefore different for 
wild-type versus mutant fragments. Gonsequently, upon migration to the position at which the level of 

0 denaturant is at the critical point, for either the wild-type or the mutant fragment, the mobility of these 
two molecules will become different, 'thus resulting in fhek separation. The mutation detection rate of 
DGGE approaches 100%. Although the technique of DGGE is relatively shnple to perform, and does 
not require radioisotopes or toxic chemicals, it does require some speciaHzed equipment. Furthermore, 
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DGGE can only be used to analyze fragments between 100 and SOObp due to the resolution limit of 
polyacrylamide gels. DGGE is advantageous over other methods useful for detecting sequence 
variations because the behavior of DNA molecules on DGGE gels can be niodeled by computer 
thereby making it possible to accurately predict the detectability of a mutation in a given, fragment. 
5 Genomic DNA fragments can be efficiently transferred from the gel following DGGE as described in 
US Patent No. 5,190,856. 

Chemical Cleavage of Mismatches 

Chemical cleavage of mismatch (CCM) is anolher technique for detection of sequence 

10 variations tihiat is useful according to the invention. CCM is based upon the ability of hydroxylamine 
and osmium tetroxide to react with the mismatch in a DNA heteroduplex and the ability of piperidine 
to cleave the heteroduplex at the point of mismatch. According to tihe method of CCM, sequence 
variations are detected by the appearance of fragments that are smaller than the untreated 
heteroduplex foHowing denaturing polyacrylaniide gel electrophoresis. 

15 DNA fragments up to Ikb in size can be analyzed by CCM with a probable 100% detection 

rate for sequence variation. CCM is particularly useful for either detecting all of the sequence 
variations in a particular fragment of DNA or for determining that there are no sequence variations in , . 
a particular fragment of DNA. 

20 Constant Denaturant CapiBarv Electrophoresis (CDCE) Analysis 

CDCE analysis is particularly useful in high throughput screening, i;e., wherein large numbers 
of DNA samples are analyzed. CDCE analysis combines several elements of both replaceable linear 
polyacrylamide capillary electrophoresis and constant denaturant gel electrophoresis. The technique of 
CDCE is a rapid, high resolution procedure that demonstrates a high dynamic range, and is 

25 automatable. The method of CDCE, as described in detail in Khrapko et aL, 1994, Nucleic Acids Res. 
22:364, involves the use of a zone of constant temperature and a denaturant concentration in capillary 
electrophoresis. Linear polyacrylamide gel electrophoresis is performed at viscosity levels that permit 
facile replacement of the matrix after each ran. For a typical 100 bp fragment of DNA, point 
mutation-containing heteroduplexes are separated from wild tj^pe homoduplexes in less than 30 

30 minutes. Using laser- induced fluorescence to detect fluorescent-tagged DNA, the system has an 

absolute limit of detection of 3 x 10"* molecules with a linear dynamic range of six orders of magnitude. 
The relative limit of detection is about 3/10,000, i.e., 100,000 mutant sequences are recognized among 
3 X 10" wild type sequences. This approach is applicable to analysis of low frequency mutations, and 

59 



BNSDOCID: <WO 030541 66A2J_> 



wo 03/054166 



PCTAJS02/41225 



to genetic screening of pooled samples for detection of rare variants. 
Rnase Cleavage 

An additional method for genotyping that is useful according to the invention is RNase 

5 Cleavage. Various ribonuclease enzymes, iacluding RNase A, RNase Tl and RNase T2 specifically 
digest single stranded RNA. When RNA is annealed to form double stranded RNA or an RNA/DNA 
duplex, it can no longer be digested with tlaese enzymes. However, when a mismatch is present in the 
double stranded molecule, cleavage at the poiot of mismatch may occur. 

RNase Cleavage is preferably performed with RNase A. Ribonuclease A specifically digests 

10 single stranded RNA but can also cleave heteroduplex molecules at the point of mismatcb. The extent 
of cleavage at single base mismatches depends on both the type of mismatcb, and the sequence of 
DNA flanking the mismatch. Sequence variations leading to mismatch are indicated by the presence 
of fragments that are smaller than the tmcleaved heteroduplex on denaturing polyacrylamide gels. 
According to the invention, RNase Cleavage involves forming a heteroduplex between a 

15 radiolabeled single stranded RNA probe (riboprobe) and a PCR product derived from a biological 
sample. If a point mutation is present in the PCR product, following treatment of the resulting 
RNA/DNA heteroduplex with RNase A, the RNA strand of the duplex may be cleaved. The saraple 
is then denatured by heating and analyzed on a denaturing polyacrylamide gel. If the RNA probe iiu , 
not been cleaved, it will be the same size as the PCR product. If the probe has been cleaved, it wiU. be 

20 smaller than the PCR product. RNase Cleavage can be used to easily detect a 1 bp deletion. 

However, smaD. insertions may not be as easily detected as small deletions, by RNASE Cleavage, as 
'looping-out' occurs on the target strand rather than the probe strand. 

Heteroduplex Analysis 

25 Anoflier method for genotyping according to ihe invention is heteroduplex analysis. 

Heteroduplex molecuka, i.e„ double stranded DNA molecules containing a mismatch, can be 

separated from homoduplex molecules on ordinary gels. The exact rate of detection of sequence 
- variations by heteroduplex analysis is unknown, but is clearly significantly lower than 100%. 

Presumably, the sequence of DNA flanking the mismatch, rather than the actual mismatch affects the 
30 detectability. Mismatches that are located in the middle of a DNA fragment are detected most easily. 

Although heteroduplex analysis is less sensitive than some of the other genotyping methods described, 

it may be considered useful according to the invention due to its simplicity. 
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Mismatch Repak Detection fMRD^ 

Another teclinique that is useful for genotyping according to the invention is mismatch repair 
detection (MRD). MED is an in vivo method that detects DNA sequence variation by the occurrence 
of a change in bacterial colony color. DNA fragments to be screened for variation are cloned into two 
5 MKD plasmids, and bacteria are transformed with heteroduplexes of these constructs. The resulting 
colonies are blue in the absence of a mismatch and white in the presence of a mismatch. MRD can be 
used to detect a single mismatch in a DNA fragment as large as 10 kb in size. MRD permits high- 
throughput screening of genetic mutations, and is described in detail in Faham et aL, 1995, Genome 
• Research 5:474. 

10 

Mismatch Recognition by DNA Repair Enzymes 

Another technique that is useful for detecting sequence variations according to the invention is 
Mismatch Recognition by DNA Repair Enzymes. The E.coli mismatch correction systems are well- 
understood. Three of the proteins required for the methyl-directed DNA repair pathway: MutS , MutL 

15 and MutH are sufficient to recognize 7 of the possible 8 single base-pair mismatches (C/C mismatches 
are not recognized) and cut/nick the DNA at the nearest GATC sequence. The MutY protein, which 
is involved in a distinct repair system can also be used to detect A/G and A/C mismatches. Some 
mammalian enzymes are also useful for mismatch recognition: thymidine glycosylase can recognize all 
types of T mismatch and 'aU-type endonuclease' or Topoisomerase I is capable of detecting all 8 

20 mismatches, but does so with varying efficiencies, depending on both the type of mismatch and the 
neighboring sequence. 

The MutS gene product is the methyl-directed repair protein which binds to the mismatch. 
Purified MutS protein has been used to detect mutations by several different methods. Gel mobility 
assays can be performed in which DNA bound to the MutS protein migrates more slowly through an 
25 acrylamide gel ihan free DNA. This method has been used to detect single base mismatches. 

An alternative method for the use of MutS in mismatch recognition, which does not require gel 
electrophoresis, involves the iirunobilization of MutS protein on nitrocellulose membranes. Labeled 
heteroduplexed DNA is used to probe the membrane in a dot-blot format. When both DNA strands 
are used, aU mismatches can be recognized by binding of Ihe DNA to the protein attached to the 
30 membrane. Although C/C mismatches are not detected, the corresponding G/G mismatch derived 
from the other strand is recognized. This technique is particularly useful because it is simple, 
inexpensive, and amenable to automation. However, the detection efficiency of this method inay be 
limited by the size of the DNA fragment. In particular, this method works well for very short 
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fragments. 

Sequencing by Hybridization fSBED 

An alternative method for detecting sequence variations according to the invention is 
5 sequencing by hybridization (SBH). According to this method, arrays of short (8 - 10 base long) 

oligonucleotides are immobilized on a solid support in a manner similar to the reverse dot-blot protocol, 
and probed with a target DNA fragment. In particular, oligonucleotides are synthesized together and 
directly onto the support. 

The synthesis system begins with a silicon chip coated with a nucleotide linked to a light- 
10 sensitive chemical group which is used to illuminate particular grid co-ordinates removing the blockmg 
group at these positions. The chip is then exposed to Ihe next photoprotected nucleotide, which 
polymerizes onto the exposed nucleotides. 

In this manner, as a result of successive roxinds of nucleotide additions, oligonucleotides of 
different sequences can be synthesized at different positions on the solid support. Thirty-two cycles of 
15 specific additions (i,e. , 8 additions of each of the four nucleotides) should enable the production of all 
65,536 possible 8-mer oligonucleotides at defined positions on the chip. 

When the chip is probed with a DNA molecule, e.g.,.a fluorescently labeled PGR product, 
fully matched hybrids should give a high intensity of fluorescence and hybrids with one or more 
mismatches should give substantially less intense fluorescence. The combination of the position and 
20 mtensity of tihe signals on the chip enables computers to derive the sequence of the DNA molecule 
being analyzed for the presence of sequence variations. 

AJlele-Specific Oligonucleotide Hybridization 

The technique of allele-specific oligpnucleotide (ASO) hybridization or the 'dot-blot' is also 

25 usefial for genotyping according to the inveaition. Under specific hybridization conditions, an 

oligonucleotide will only bind to a PGR product if the two are 100% identical. A single base pair 
mismatch is sufficient to prevent hybridization. A pair of oligonucleotides, one carrying the wUd type 
base and the other carrying a single base change, as compared to the wild type sequence, can be used 
to determine if a PGR product is homozygous wild type, heterozygous or homozygous mutant for a 

30 particular base change. When performing conventional dot blots, tiie PGR product is fixed onto a nylon 
membrane and probed with a labeled oUgonucleotide. When performing a 'reverse dot blot' , an 
oligonucleotide is fixed to a membrane and probed with a labeled PGR product The probe may be 
isotopically labeled, or non-isotopically labeled. The technique allows for the genotypmg of multiple 
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PGR amplified samples for the presence of a single base change. 
Allele-Specific PGR 

Many methods for identifying sequence variations involve the analysis of PGR-ampMed 
5 DNA. The allele-specific polymerase chain reaction (also called the amplification refractory mutation 
system or ARMS) comprises an assay that occurs during the PGR reaction itself. ARMS requires the 
use of sequence-specific PGR primers which differ from each other at their termiaal 3 ' nucleotide and 
are designed to amplify only the normal allele in one reaction, and only the mutant allele m another 
reaction. When the 3' end of a specific primer is 100% identical to the target, amplification occurs. 
10 When the 3' end of a specific primer is not 100% identical to the target, amplification does not occur. 
Agarose gel electrophoresis is used to detect the presence of an amplified product. The genotype of a 
(heterozygous) wild-type sample is characterized by amplification products in both reactions, and a 
homozygous mutant sample generates product in only the mutant reaction. 

This technique can be modified so that the 5' ends of flie allele-specific primers are labeled 
15 wifli different fluorescent labels, and the 5' end of the common primers are biotin labeled. According 
to tibis alternate protocol, the wild-type specific and tire mutant-specific reactions are performed in a 
■ single tube. The advantages of this approach are that a gel electrophoresis step is not required, and flie.. 
method is amenable to automation. 

20 PriTner-Tn troduced Restriction Analysis 

The method of primer-introduced restriction analj^is (PERA) can also be used for genotyping 
according to the invention. PIRA is a technique which allows known sequence variations to be 
detected by restriction digestion. By introducing a base change close to the position of a known 
sequence variation (for example by using a PGR primer containing a mismatch, as compared to the 

25 target sequence), it is possible to create a restriction endonuclease recognition site that indicates the 
presence of a particular sequence change. The combination of the altered base in the primer sequence 
and the altered base at the mutation site, creates a new restriction enzyme target site. This approach 
maybe used to create a new restriction enzyme site in either the wild-type allele or the mutant allele. 
If a novel lestriction enzyme site is introduced in the mutant allele then, following digestion with the 

30 appropriate restriction enzyme, the homozygous wild-type form would produce a single band of the 
full-length size, the homozygous mutant form would produce a single band of the reduced size and the 
heterozygous form would produce both full length and reduced sized bands. Band size win be analyzed 
by gel electrophoresis. 
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Oligonucleotide ligation Assay 

The technique of oligonucleotide ligation can also be used for genotyprag according to the 
invention. 

The raethod of oligonucleotide ligation is based on the following obsei-vations. If two 
5 oligonucleotides are annealed to a strand of DNA and are exactly juxtaposed, they can be joined by 
the enzyme DNA ligase. If there is a single base pair mismatch at the junction of the two 
oligonucleotides then ligation will not occur. According to the meliiod of oligonucleotide ligation, the 
■ two oligonucleotides used in the assay are modified by the addition of two different labels. According 
to this method, flbie assay for a ligated product involves detecting a ligated product by assaying for the 
10 appearance of the labels of the two oligonucleotides on a single molecule rather than visualization of a 
new, larger sized DNA fragment by gel electrophoresis. 

When ligation reactions are conducted in 96-weU microliter plates and ligation is scored by 
ELISA, the oligonucleotide ligation assay can be performed by a robot and the results can be analyzed 
by a plate reader and fed directly into a computer. This method is therefore extremely useful for 
15 detecting the presence of a sequence variation in a large number of samples. The oligonucleotide 

ligation assay is performed on PCR-amplified DNA. A modification of this assay, termed the ligase , 
chain reaction, is performed on genomic DNA and involves amplification with a thermostable DNA . y 
ligase. 

20 Direct DNA Sequencing 

Genotyping according to the invention may also be carried out by directly sequencing the 
DNA sample in the region of the gene of interest, using DNA sequencing procedures weD-fcnown in 
the art (described above in Section D, entitled "Isolation of a Wild Type Gene"). 

25 Mini-Seqnencing 

The technique of mini-sequencing (also known as single nucleotide primer extension) can also 
be used to detect any known point mutation, deletion or insertion, according to the invention. Obtaining 
sequence information for just a single base pair only requires the sequencing of that particular base. 
This can be done by including oiily one base in the sequencing reaction rather than all four. When this 
30 base is labeled and complementary to the first base inunediately 3 ' to the primer (on the target strand), 
the label wiH not be incorporated. Thus, a given base pair can be sequenced on the basis of label 
incorporation or f aflure of incorporation without the need for electrophoretic size separation. 
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5' Nuclease Assay 

Genotyping according to the invention can also be performed by flie method of 5 ' nuclease 
assay. The 5' nuclease assay is a technique that monitors flie extent of amplification in a PGR 
reaction on the basis of the degree of fluorescence in the reaction mix. A low level of fluorescence 

5 indicates no amplification or very poor amplification and a high level of fluorescence indicates good 
amplification. This system can be adapted to peraiit identification of known sequence variations, 
without the need for any post-PCR analysis other than fluorescence emission analysis. 

PGR amplification is detected by measuring the 5 ' to 3 ' exonuclease activity of Taq 
polymerase. Taq polymerase cleaves 5' termiual nucleotides of double stranded DNA. The preferred 

10 substrate for Taq polymerase is a partially double stranded molecule. Taq polymerase cleaves the 
strand that contains the closest free 5' end. According to the 5' nuclease assay, an oligonucleotide 
'probe' which is phosphorylated at its 3' end so as to render it incapable of serving as a DNA 
synthesis primer, is included in the PGR reaction. The probe is designed to anneal to a position 
between the two amplification primers. When an actively extending Taq polymerase molecule reaches 

15 the probe molecule, it partially displaces the probe and then cleaves tiie probe at or near the single 
stranded/double stranded cleavage site until the entire probe is broken up and removed from the 
template, llie polymerase continues this process of displacement and cleavage until the entire probe is 
broken up aiid removed from the template. The probe is labeled in a manner that permits detection of 
the removal of the probe. In particular, the probe is labeled at different positions with two different 

20 fluorescent labels. One label has a localized quenching effect on the fluorescence of the other 

(reporter) label. This effect is mediated by energy transfer from one dye to the otiier, and requires that 
flie two dyes are in close proximity to each other. If the probe is cleaved at a position between the 
reporter and the quencher dyes, the two dyes become physically separated thereby resulting in an 
increase in fluorescence which is proportional to the yield of the PGR product. 

25 

Representational Difference Analvsis (RDA') 

Genotyping according to the invention can also be carried out by Representational Difference 
Analysis (RDA). RDA is described in detail in lisitsyn et al. , 1993 , Science 259 :946, and an 
adaptation which combines selective breeding wilh RDA is described in Lisitsyn et al., 1993, Nature 
30 Genet. 6:57 . RDA identifies sequence dissimilarities through the application of a powerful approach to 
subtractive hybridization. Accordmg to the method of RDA, one first creates simplified 
representations, called amplicons, from two samples that are being compared. An amplicon can 
comprise, for example, the set of Bglll fragments that are small enough to be amplified by the PGR. 
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The iterative subtraction step begins with the ligation of a special adaptor to the 5' end of fragments 
contained in the amplicon derived from the test sample (tester amplicon). The tester amplicon is flien 
melted and briefly reannealed in the presence of a large excess of amplicon, derived from the wild 
type sample (driver amplicon). Those tester fragments that reanneal (presumably fragments absent 
from the wild type, driver amplicon) can serve as a template for the addition of the adaptor sequence 
to the 3 '-end of the "partner" fragment. As a result, these tester fragments can be exponentially 
amplified by PGR. This procedure is then repeated to achieve successively higher enrichment. 

RDA may be used to clone sequences that are either wholly absent from the wild type sample 
or are present in the wild type DNA, but are contained in a~restriction fragment that is too large to be 
amplified in the amplicon. The former case may arise from a total deletion; the latter from a restriction 
fragment length polymorphism with the short allele present in the tester but not the wild type DNA. 
RDA is useful for subtracting DNA from an individual with a particular disease from normal DNA so 
as to identify regions showing homozygous or heterozygous deletions; locating fragments present in a 
parent with a dominant disorder but absent in his unaffected offspring; and locating mRNAs expressed 
in normal tissue but not present in tissue isolated from an individual with a particular disease. 

Denaturing High Performance Liquid Chromatographv 

According to the scanning method of Denaturing High Performance Liquid Chromatograp^iv 
(DHPLC), partial heat denataration and a linear acetonitrile colnrim are used to identify 
20 polymorphisms in DNA fragments. DHPLC provides a method of comparative DNA sequencing 
based on the capability of ion-pair reverse phase liquid chromatography on aHq^lated nonporous 
poly(styrene divinylbenzene) particles to resolve homo- from heteroduplex molecules under .conditions 
of partial denaturation. This method can potentially be automated to allow for rapid analysis of a large • 
number of samples (Underbill et al., 1996, Proc. Natl. Acad. Sci. USA, 93:196). 

25 

Mass Spectiroscopv 

Matrix-assisted laser desorption-ionization-time-of-flight (MALDI-TOF) mass spectroscopy is 
another method according to the invention by which genotyping can be performed. The method of 
MALDI-TOF mass spectroscopy is based on the irradiation of crystals formed by suitable small 
30 organic molecules (referred to as the matrix) with a short laser pulse at a wavelenght close to the 
resonant adsorption band of the matrix molecules. This causes an energy transfer and desorption 
process producing matrix ions. Low concentirations of nucleic acid molecules are added to the matrix 
molecules while in solution and become embedded in the solid matrix crystals upon drying of the 
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mixture. The intact nucleic acids are then desdrbed into the gas phase and ionized upon irradiation with 
a laser allowing their mass analysis. MALDI is used primarily wilii time-of-flight spectrometers 
where the time of flight is related to the mass-to-charge ratio of the nucleic acids molecules. 
Reviewed in Griffin T.J. and Smith L.M., 20Q0, Trends Biotech 18:77. 

5 Genotyping can be performed by any of the following MALDI-TOF mass spectroscopy 

approaches including sequencing of PGR products (Fu, D-J et al., 1998, Nat. Biotechnol. 16:381; 
Kirpekar, F. et al.. Nucleic Acids Res. 26:2554), direct mass-analysis of PGR products (Ross, P.L. et 
al., 1998, Anal. Chem. 70:2067), analysis of aflele-specific PGR (Taranenko, N.L et al, 1996, Genet. 
AnaL Biomol. Eng. 13:87) or LGR Qigase chain reaction; Jurinke, C. et al., 1996, AnaL Biochem. 

10 237:174) products, analysis of RELP-PGR products (Srinivasan, J.R. et al., 1998, Rapid Gommun. 
Mass Spectrom. 12:1045), minisequencing (Haff, L.A. and Smirnov, LP., 1997, Genome Res. 7:378; 
Higgens, G.S. et al., 1997, BioTechniques 23:710), analysis of PNA (peptide nucleic acid) hybridization 
probes (Griffin, T.J. et aL, 1997, Nat. Biotech. 15:1368; Ross, P.L., Anal. Ghem. 69:4197; Jiang- 
Baucom, P. et aL, 1997, AnaL Ghem. 69:4894), or direct analysis of invasive cleavage products 

15 (Griffin, T.J. et al, 1999, Proc. Natl. Acad; Sci. USA 96:6301). 

6. Methods of Specifying a Pol3anorphism 

The invention provides methods for specifying a particular polymorphism. By "specifying an 
polymorphism" is meant defining a polymorphism in the context of a larger region of nucleic acid ' • 
20 which contains flie polymorphism, and is of sufficient length to be easily differentiated from any other 
position in the genome. 

A unique nucleotide position (e.g. a polymorphic site) in the human genome can be specified 
by describing a unique sequence of DNA within the genome, and providing the location of the unique 
nucleotide position relative to that sequence. Preferably this is done by providing the sequence identity 
25 of a length of unique DNA containing the polymorphism, and indicating which of the nucleotide sites is 
polymorphic. 

A calculation can be made to determine a sequence length which will be unique in the 3 biDion 
nucleotide human genome. If it is assumed that the genome contaios .equal muribers of the nucleotides 
A, G, G and T, and that they occur randomdy in the genome, one can determine the probability of any 
30 given sequence of a defined length occurring in the genome; a random 12mer wUH appear in a random 
3,000,000,000 bp genome 179 times, a random 15 mer will appear in a random 3,000,000,000 bp 
genome 3 times and a random 16mer will appear in a random 3,000,000,000 bp genome 1 time. 

Thus, it would appear that specifyiag 16 bp would uniquely define a sequence in the genome. 
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However, the genome is not composed of random sequence and does not contain equal amounts of A, 
G, C and T. In fact, 10-12 bp sequences are likely to be specific for 95% of genes. Some sequences 
may even be specified by as few as 8 nucleotides. The minimum sequence length that is useful 
according to the invention for identifying polymorphisms in most gene and intergenic sequences is 

5 approximately 9-15 bp. 

In the case of repeat sequences and sequences associated with gene families, the probability 
of observing a particular sequence is greatly increased and it becomes difficult to specify a 
polymorphism in the context of a sequence that is only on the order of 9-15 bp. There are many types 
of repeats including tandem repeats, where a larger sequence block has within it smaller repeat units 

10 (e.g. microsatellites). Tandem repeats usually occur within non-genic areas, but can also occur within 
genes and subsequently affect gene function; they can be 10-lOOOs of bp long, or, if located in 
centromeres and telomeres, be megabase sized. Some repeats are composed of blocks wbicb do not 
have sub-repeat units and are non-functional (e.g. -300 bp Ahi repeats). These occur by 
duplication/dispersal throughout the genome. 

15 It may be difficult to specify a polymorphism that occurs in a gene that is a member of a gene 

family. Through the mechanism of gene dupHqation, gene families, comprising multiple copies of a 
gene in which some, but not all of the DNA sequence has diverged, have been formed. Thus, certain 
regions of a gene maybe conserved in different gene family members: With time, a duplicated gene 
can lose function and the sequence of the duplicated gene can deteriorate; the amount of homology 
20 between the original gene and the duplicated version depends upon tihe time since duplication. Other 
dupKcations maintain fimction and retain some level of similarity with the original gene in the important 
domains. Some related genes can share nearly 100% homology across a region that is hundreds of bp 
long, and yet have no significant homology at any other location. In these cases, it may be necessary 
to specify dozens or more nucleotides to provide a unique sequence. 
25 To identify a unique sequence, a search must be done wherein a specific sequence is 

compared to all known human sequences and the mmimum unique sequence is defined. However, m 
the absence of a complete sequence for the human genome, it cannot be guaranteed that a sequence 
is truly unique. Empirical experimentation can be used to determine the minimum sequence for 
specificity/uniqueness. In the case of a gene family member, if sequence information is available for 
30 the region corresponding to the region of interest in other members of the gene family, than it may be 
possible to define a unique short (9-15 bp) sequence that contains a polymorphism and has specificity. 
In the event that a particular region cannot be defined as unique, a larger region of nucleic acid which 
contains the polymorphism will be required to define a polymorphism in a gene that is a member of a 
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gene family. It is predicted tbat a sequence of 9-15 bp will be sufficient to define a polymorphism in 
99% of all cases. 

Methods of specifying a polymorphism that involve using sequences which either encompass 
or overlap the polymorphic site to be tested or do not encompass or overlap the polymorphic site to be 
5 tested are useful according to the invention and are described below. 

Oligonucleotide Hybridization. 

An oligonucleotide is designed such that it is specific for a target sequence, and hybridizes 
only at the target sequence site. This oligonucleotide will not hybridize if the target sequence differs at 

10 the position in the sequence to be tested. Another oligonucleotide is designed such that it hybridizes 
with the polymorphic form of the sequence. A DNA sample is tested for hybridization with each of 
the two probes independently. If the DNA hybridizes to only one of the probes, it can be concluded 
that the individual is homozygous for the corresponding sequence. If both probes hybridize to a test 
DNA sample, then the individual is heterozygous. Hybridization will be detected by the method of 

15 Southern blot analysis (as described in Section C entitled 'Troduction of a Nucleic Acid Probe"). 

Specifyiiig a Polymorphism by PGR . . : 

An alternative method for specifying a particular polymorphism involves a PCR-based 
strategy. According to this method, a region of a candidate gene to be tested is amplified by PGR (as 

20 described). The amplified fragment is digested with a restriction enzyme that will not cut a fragment 

that contains a polymorphism, due to the location of the polymorphism within the recognition site of this 
restriction enzyme. The products of the digestion reaction mixture are size separated in an agarose gel, 
stained with ethidium bromide, and visualized under ultraviolet light to determine if the amplified 
product has been digested. According to this method, ths PGR primers provide the specificity for a 

25 particular polymorphism by virtue of the specific sequence of the two primers, as well as by the 

location of the primer binding sites in tihe target DNA. Although, multiple sites for primer binding may 
exist in a target DNA sequence, only the sites that are close enough together wiH produce an amplified 
product that includes the nucleic acid region containing the polymorphism. 

Alternatively, a PGR reaction is carried out with PGR primers that contain polymorphisms. 

30 According to this embodiment, if the template nucleic acid lacks the polymorphism present in the 
primers there will be no PGR product. Thus, according to this embodiment of the invention, the 
absence of a PGR product indicates that a polymorphism is not present in the target sequence. 
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Primer Extension 

A DNA fragment comprising the region containing a polymorphism is PGR amplified from an 
individual to be tested. The PGR product is denatured and one strand is retained for analysis. An 
oligonucleotide probe is designed such that it is specific for a region in the sequence and hybridizes 
such that its 3 ' terminal nucleotide is paired with the nucleotide adjacent to the one to be tested. The 
PGR product and probe are combiued with a polymerase and terminating, differentially colored, 
nucleotides. The polymerase extends the probe by one base, and only the base which is 
complementary to the site being tested is added. The reaction is washed, and the color of the reaction 
mdicates the nucleotide that has been added and the sequence at the position of interest. 

The PGR step provides one level of specificity by amplifying a region (1 - 10000 bp as desired 
between the PGR primers) from a complex (3,000,000,000 bp) mixture. Hie PGR probes primers must 
be unique in both their hybridization specificity and their proximity to one another. Since proximity of 
the two PGR primers is needed (i.e. a distance across which a polymerase can extend to join the 
primers), shorter PGR primers can be used, e.g. in theory a small enough region could be amplified 
Willi a 8-lG bp binding site for a PGR primer. To ensure that a primer hybridizes with specificity, a 
primer must be at least 5 bp. 

A second level of specificity is provided by the primer which is extended in the primer 
extension reaction. Since this primer is hybridizmg to a short piece of DNA, it can be short and unique 
for Ihe fragment with which it bmds. The primer is at least 5bp and preferably 8bp. Although the 
I primer used for the primer extension step is located probe adjacent to the polymorphic site, the PGR 
primers should not overlap with file polymorphic site being tested. 

■Southern Blottmg 

One method for detectmg a previously defined polymorphism involves Southern blot analysis 
3 of wild type and mutant DNA following digestion with a restriction enzyme which has a recognition 
sequence which includes the polymorphic site to be tested. According to this method, a particular 
restiiction enzyme cuts wild type DNA but does not cut mutant DNA due to the presence of a 
polymorphism withm the recognition site of this restiiction enzyme. Many restiiction enzymes exist 
which recognize 4bps. The resulting fragments will be size separated in an agarose gel, fransferred to 
0 a membrane and probed with a nucleic acid probe. If the site is uncut, the Augment is one length and 
if the site is cut the fragment will be of a shorter length. 

The nucleic acid hybridization probe will provide specificity to the particular polymorphism 
being tested by defining flie polymorphism in the context of a larger stiretch of nucleic acid sequence. 
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The nucleic acid probe may comprise Hie nucleic acid sequence corresponding to the region known to 
contain liie polymorphism. The sequence-specific probe maybe located 10, 100, 1000, or even 100s of 
thousands of bases from the region containing liie polymorphism. If the probe is located some distaiice 
from the region containing the polyniorpMsm, an intervening recognition site for the restriction enzyme 
5 cannot be located between the probe hybridization site and the region of interest cont ainin g the 

polymoiphism site. Typically, a hybridization probe useful according to this method wiU be much larger 
than the minirnuTn length of a sequence (9-15 bp) required to give specificity to, or define a particular 
polymorphism. 

Alternatively, a chemical or enzyme which recognizes a unique pair of nucleotides at Ihe site 
10 of a polymorphism, can be used to detect the polymorphism. According to this method, the amount of 
sequence required for recognition by a chemical or enzyme is 2 bp (providing that the 2 bp sequence is 
unique in a region large enough to produce a firagment which can then be bound by a specific probe). 

According to a variation of the above method, a labeled chemical or enzyme which binds to 
one sequence of Ihe polymorphic recognition site and not another is used. This method involves the 
15 steps of digesting the DNA with a restriction enzyme, and adding a labeled, sequence-specific binding 
protein (e.g. a restriction enzyme that lacks cleavage capability). The sequence-specific binding 
protein will bind to multiple sites in the genome, including the site to be tested. The fragments will be 
separated on a gel and then probed with a probe specific for the test sequence. If the fragment 
identified by the second probe is identical to a fragment identified by the first probe (e.g. the labeled 
20 chemical or enzyme), tiien the sequence being tested for is present. 

7. Determination of the Phenotypic Outcome of a Polymorphism 

To determine the phenotypic outcome of a polymorphism according to the invention, it is 
25 necessary to screen suitable populations to obtain a statistically significant measure of the association 
of a polymorphism with a particular disease (e.g osteoarthritis). The invention provides methods for 
performing polymorphism genotyping in appropriate populations (desoibed above). The invention also 
provides in vitro and in vivo assays usefiil for determining the phenotypic outcome of a polymorphism 
in a candidate gene. 

30 Every polymorphism has the potential to alter the genetic activity of an individual. At the level 

of a single gene, the effect of a polymorphism can range from an inconsequential, silent change to a 
change that causes a complete loss of protein function to a gain of aberrant or detrimental fonction 
mutation. The severity of the effect of a polymorphism on gene activity will depend on the exact 
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molecular consequences of the particular polymorphism. For example, alterations of a single pre- 
mRNA splicing dinucleotide could have profound effects on both the quantitative and qualitative 
properties of gene activity since alterations in splicing efficiency can both reduce the overall level of 
normal transcription as weU as cause "exon skipping". If the deleted exon involves a coding exon then 
5 exon skipping wiU lead to an alteration in the amino acid composition of the resulting protein and likely 
effect protein activity. To accurately asses the role of a particular polymorphism in the regulation of 
various molecular events, appropriate assays for both gene expression and protein function must be 
carried out. 

In vitro assays useful for determining flie effects of a polymorphism on gene expression and 
10 protein function include, but are not limited to the following. 
1. Transcriptional Regulation 
. The transcriptional regulation of a candidate gene containing a polymorphism may be altered, 
as compared to the wild type gene. 

15 Promoter Activity 

If a polymorphism is located in the promoter, enhancer or repressor region of a candidate 
. gene, promoter assays (weh known in the art) wherein the altered promoter of the candidate gene is - 
used to drive the expression of a reporter gene (e.g. CAT, luciferase, GFP) are performed. Changes 
in the transcriptional regulation of a candidate gene due to the presence of a polymorphism can also be 
20 detected by methods useful for measuring the level of mRNA including SI nuclease mapping and RT- 
PCR. 



SI Analysis 

The SI enzyme is a single-stranded endonuclease that wiH digest both smgle-stranded RNA 
and DNA. According to the method of SI analysis, a probe that has been efficiently labeled to a high 
specific activity at the 5' end through the use of a kmase, is used to determine either the amount of an 
mRNA species or the 5' end of a message. A single stranded probe that is complementary to the 
sequence of the RNA species of interest is utilized in S 1 analysis. If the stracture of a particular 
mRNA species is known, SI analysis is performed with oligonucleotide probes of at least 40 bp, that 
are complementary to the RNA of interest. It is preferable to use oligonucleotides wherein the 5' end 
of flie oHgonucleotide is complementary to the RNA. It is also preferable to use oligonucleotides 
wherein the 5' terminal residues contam dG or dC residues. If Si nuclease analysis will be utilized to 
deteimme the 5' termii of an RNA species, the 3 ' end of the oHgonucleotide should extend at least 4 
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nucleotides beyond the RNA coding sequence. The inclusion of additional nucleotides facilitates 
differentiation of a band resulting from an RNArDNA duplex and a band representing the probe. 

A hybridization probe for SI analysis is prepared by incubating 2pmol of an oligonucleotide in 
the presence of 150 mCi[y^2P]ATP (3000-7000Ci/mmol), 2.5 ml lOX T4 polynucleotide kinase buffer 
5 (700mM Tris-Cl, pH 7.5, 100 idM MgCl^, 50 mM dithiothreitol, 1 mM spermidine-Cl, 1 mM EDTA), 
and lOU T4 polyaucleotide kinase for 37°C for 30-60 minutes. The radiolabeled probe is ethanol 
precipitated and resuspended at linl/0.3ng oligonucleotide or 10^ cpm. 

The hybridization reaction is performed as follows. An amount of probe equal to 5x10* 
Cerenkov counts is added to 50mg RNA on ice and ethanol precipitated. The restilting pellet is 
10 resuspended in 20nil S 1 hj^ridization solution (80% deionized formamide, 40 mM PIPES, pH 6.4, 
400mM NaCl, 1 mM EDTA, pH 8), denatured for 10 min at 65°C and hybridized overnight at 30°C. 
The following day, 300 ml of a mixture of 150 ml 2x SI nuclease buffer (0.56M NaCl, 0.1 M sodium 
acetate, pH 4.5, 9mM ZnSOJ, 3ml 2mg/ml single-stranded calf thymus DNA, 147 ml H2O and 300U 
SI nuclease is added to the hybridization reaction and incubated for 60 minutes at 30°C. Following the 
15 addition of 80ml SI stop buffer (4M ammonium acetate, 20mM EDTA, 40 mg/ml tRNA) the sample is 
ethanol precipitated, resuspended in formamide loading dye, denatured and analyzed on a denaturing 
polyacrylamide/urea gel of the appropriate percentage for the expected size of the protected band 
(Ausubel et al, supra). 

20 RT-PCR 

The method of RT-PCR is useM according to the invention for RNA ejcpression analysis. 
According to the method of reverse transcription /polymerase chain reaction (RT-PCR) during the 
reverse transcription (RT) step, the RNA is converted to first strand cDNA, which is relatively stable 
and is a suitable template for a PCR reaction. In the second step, the cDNA template of interest is 

25 ampUtied using PCR. This is accomplished by repeated rounds of annealing sequence- specific 

primers to either strand of the template and synthesizing new strands of complementary DNA from 
them using a thermostable DNA polymerase. 

An RNA sample is ethanol precipitated with a cDNA primer. It may be preferable to use a 
cDNA primer that is identical to one of the amplification primers. To the pellet is added 12 ml HjO, 

30 4ml 400mM TrisCl, pH 8.3, and 4 ml 400 mM KCl. The mixture is heated to 90°C, slow cooled to 
67°C, microfuged and incubated for 3 hours at 52°C. Following the addition of 29ml reverse 
transcriptase bufifer (per sample/2.5nil 400mM TrisCl, pH8.3, 2.5ml 400mM KCl, 1ml 300mM MgClj, 
5inl lOOmM DTT, 5ml 5inM 4 dNTP mix, 2ml actinomycin D, 1 1ml EtO) and 0.5ml (16U) AMV 
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reverse transcriptase, the sample is incubated for 1 hour at a temperature between 37°C and 55°C. 
The temperature will be adjusted in accordance with the composition of the primer and the RNA of 
interest. The sample is then extracted sequentially with phenol and chloroform, and efhanol 
precipitated. The resulting cDNA pellet is resuspended in 40ml H2O. 5ml of the cDNA sample is 

5 mixed with 5ml or each amplification primer (~20mM each), 4ml 5mM 4dNTP mix, 10ml lOX 

amplification buffer (SOOniM KCl, lOOmM TrisCi, pH8.4, Img/ml gelatin) and 70.5rnlH20. After the 
mixture is heated for 2 minutes at 94°C, 0.5 ml (2.5U) Taq DNA polymerase is added and the sample 
is overlaid with naineral ofl. PGR amplification of the cDNA will be performed using the following 
automated amplification cycles: 39 cycles (2 minutes at 55°C, 2 minutes at 72°C, 1 minute at 94°C), 1 

10 cycle (2 minutes at 55°C, 7 nodnutes at 72*^). The number of cycles can be varied in accordance with 
the abundance of RNA (Ausubel et al., supra). 

If a polymorphism is located in a transcription factor binding site, assays including but not 
limited to the yeast two-hybrid assay (Fields et al, 1994, Trends Genet., 10:286) can be used to 
determine the effects of a polymDrphism on transcription factor bindiag. 

15 If the protein product of the gene of interest is a DNA binding protein the phenotypic outcome 

of a polymorphism may be impaired nuclear transport, DNA binding, chromatin assembly or chromatin 
structure, methylation or histone deacetylation. , :, • 

Nuclear Transport 

20 Immunocytochemical methods or cell fractionation techniques (as described above) are used 

to determine if the protein is correctly localized in the nucleus. 

The DNA binding properties of a transcription factor are determined by gel shift analysis (as 

described in Ausubel et al., supra), oligonucleotide selection, southwestern assays or by 

immunohistochemical analysis of fixed chromosomes. 
25 - 

Gel Shift Analysis 

The method of gel shift analysis is used to detect sequence specific DNA-binding proteins 
firom cmde extracts. According to this method, proteins that bind to an end-labeled DNA fragment wDl 
retard the mobility of the fragment. The change in the mobility of the labeled firagment is detected by 
30 the appearance of a discrete band comprising the DNA-protein complex. 

A number of methods for preparing nuclear and cytoplasmic extracts useful for gel shift 
. analysis are known in the ait For example, nuclear extracts are prepared according to the following 
method. A cell pellet is washed in PBS, resupended in a volume of hypotonic buffer (10 mM lEIEPES, 
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pH 7.9, 1.5 mM MgClz, lOmM KCl, 0.2 mM PMSF, 0.5 mM DTT ) that is approximately equal to 3 
times the packed cell volume and allowed to swell on ice for 10 minutes. Cells are homogenized in a 
glass Bounce homogenizer and the nuclei are collected by centrifiigation and resupended in a volume 
of low-salt buffer (20 mM HEPES, pH 7.9, 25% (v/v) glycerol, 1.5 mM MgCl^, 0.02 M KCl, 6.2 mM 
5 EDTA, 0.2 mM PMSF, 0.5 mM DTT) equivalent to one-half of the packed nuclear volume. Following 
the addition of a volume of high-salt buffer (20 mM HEPES, pH 7.9, 25% (v/v) glycerol, 1.5 mM 
MgCli, 1 .2 M KCl, 0.2 mM EDTA, 0.2 mM PMSF, 0.5 mM DTT) equivalent to one-half of the 
packed nuclear volume (dropwise with stirring) to the nuclei, nuclear extraction is carried out for 30 
minutes with continuous gentle stirring. The nuclei are collected by centrifiigation and the nuclear 
10 extract is dialyzed against 50 volumes of dialysis buffer (20 mM HEPES, pH 7.9, 20% (v/v) glycerol, 
lOOmM KCl, 0.2 mM EDTA, 0.2 mM PMSF, 0.5 mM DTT) until the conductivities of extract and 
buffer are equivalent. The extract is removed from the dialysis tubiag and analyzed for protein 
concentration (Ausubel et al., supra). 

Probes useful for gel shift analysis include a fragment of plasmid DNA or a gel-purified 
15 double stranded oligonucleotide. Preferably the probe is labeled with Klenow fragment by incubating a 
lOOml solution of plasmid DNA or ohgonucleotide withlOOmCi of the desired [a-^^P] dNTP, 4ml of 5 
mM 3dNTP mix and 2.5 U Klenow fragment for 20 minutes at room temperature. Upon the addition ; 
of 4ml of a solution comprising 5 mM of the dNTP corresponding to the radioactive dNTP, the sample 
is incubated for 5 minutes at room temperature. The radiolabeled probe is ethanol precipitated, 
20 resuspended in TE buffer and gel purified. 

Gel shift analysis is performed by incubating 10,000 cpm of the labeled probe (0.1-0.5 ng) with 
2mg poly (dI-dC)-poly(dI-dC), 300 mg BSA, and approximately 15mg of a nuclear extract or buffered 
cmde protein extract prepared, for example, as described above, for 15 minutes at 30°C. An aliquot of 
the binding reaction is analyzed by electrophoresis on a prewarmed low-ionic strength gel (e.g. a 4% 
25 polyacrylamide gel in TBE) and autoradiography (Ausubel et al, supra). 

Qlieoselection Assays for DNA Binding Activity 

DNA binding activity is an essential property of proteins involved in many basic cell biological 
events, such as chromatin structure, transcriptional regulation, DNA replication and repair. The 
30 biological activity of a DNA binding protein can be assayed by defining the optimal target DNA 

binding site. Using the PCR based primer selection technique (Blackwell, 1990, Science, 250:1104) the 
canonical nucleotide sequence defining llie binding site is elucidated in vitro by mixing purified full 
length protein, or just the DNA binding domain of a protein of interest, with an oligonucleotide duplex 
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pool contaiDing a completely randomized central region flanked by primer-annealing sites. Multiple 
rounds of immunoprecipitation and amplification by PGR enriches for high affinity sites which are 
cloned are sequenced in order to define a canonical binding site. 

The ability of a DNA binding protein to correctly regulate chiomatin assembly and structure 
5 can be determined by DNase hypersensitivity assays. Alternatively, coimmunoprecipitation 
experiments or Western blot analysis can be used to determine if the DNA binding protein is 
associated with a component of the chromatin. 

Southwestern Blot Assay for Protem-DNA Interactions 

10 The ability of a protein to bind DNA is measured by using the "Southwestern" blot technique 

(for example see Antalis et al. , 1993 , Gene, 134:201). According to this method, radiolabelled DNA is 
incubated with protein that has been immobilized on nitrocellulose filters and the amount of bound 
DNA is measured by scintillation counting or autoradiography followed by densitometry. The protein 
to be tested can be pure protein, itmnunoprecipitated protein, crude cell lysates or even recombinant 

15 protein denatured directly from bacterial colonies, yeast or cell culture. 

Assay of Protein Binding to Chromosomes in Vivo: hnmunocvtolo gv of Fixed Chromosomes . !• 

Numerous biologically important nuclear proteins are in direct contact with genoraic DNA. 
The presence of fliese proteins can be detected immunocytologically by fixing metaphase 
20 chromosomes such that the protein is permanently fixed at the region of DNA to which it normally 
bmds. The presence and cytolpgical location of the protein can then be determined by incubating the 
fixed chromosomes with an antibody directed against the protein of interest, and performing standard 
metiiods of immunohistochemical staining (Zink and Paro, 1989, Nature, 337:468). 

25 rriitntmiTin precipitation Assay for Chromatin A sRf^Tnbly/Stnicture 

If an antibody specific for a protein of interest exists, inomunoprecipitation can be used to test 
for the presence of flie protein (Otto and Lee, 1993, Methods Cell Biol, 37:119, Banting, 1995, In 
Gene Probes 1: A practical approach. Chapter 8: Antibody probes, pp. 225-227, IRL press.). The 
following methods are used for determining if a protein of interest is associated with a particular 

30 subcellular component Accordmg to one method, proteins are immunoprecipitated with an antibody 
specific for a cellular component (e.g. chromatin or nuclear antigens), the immunoprecipitated.material 
is analyzed on a gel by denaturing polyacrylamide gel electrophoresis and western blot analysis is 
performed with an antibody specific for the protein of interest, to determine if a physical association 
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exists between the cellular component and the protein of interest. Various incubation and wash 
treatments of the celllysate are used to remove background contamination and enhance the sensitivity 
of detection (Banting, 1995 , supra). AltemativBly, the initial immunoprecipitation can be carried out 
with the antibody specific for the protein of interest, and the western blot analysis can be performed 

5 with an antibody specific for a cellular component. According to a variation of this method, prior to 
immunoprecipitation the cells can be treated with a protein crosslinker to ensure that protein-protein 
interactions are maintained during immimoprecipitation. According to another variation of this method, 
proteins can be cross-linked to DNA and then precipitated (Dedon et al., 1991, AnaL Biochem., 
197:83). If DNA coprecipitates with a particular protein, this suggests that DNA is associated with, 

10 and presumably bound to the protein. The coprecipitating DNA can be sequenced to identify the 
bound sequence. 

DNAse Hypersensitivity 

The transcriiptionally active promoter region of a gene can be analyzed for susceptibility to 

15 cleavage by DNAsel (Montecino et al.,.1994,Biocliemistry, 33 :348). Efficient cleavage of genomic 
DNA is dependent on the accessibility of this enzyme to the DNA, and is influenced by several 
factors, including nucleosome packaging, overall chromatin configuration, and the presence of DNA 
binding proteins such as transcription factors. DNA sequence variations within the promoter DNA 
may have proformd effects on Ihese factors and result in aberrant regulation of gene transcription and 

20 ultimately abnormal biological activity of the gene. Therefore, altered gene activity around a 

polymorphic site can be detected as increased or decreased DNAsel hypersensitivity (Vaishnaw et 
al., 1995, Immunogenetics, 41:354). 

Assay for DNA Methvlation 

25 Accurate mapping of DNA mefhylation patterns, for example, in CpG islands which are 

unmethylated regions of DNA, is used to investigate and gain a better understanding of diverse 
biologicid processes such as the regulation of imprinted genes, X chromosome inactivation and tumor 
suppressor gene silencing inhuman cancer. DNA methylation at specific sites is most frequently 
studied by use of methylation-sensitive restriction endonucleases (for example Hpall) and Southern 

30 blotting (Sambrook et al., supra). The sensitivity of this method can be enhanced several hundred-fold 
by performing a ligation-mediated PCR step (as described in Steigerwald et al., 1990, Nucleic Acids 
Res., 6:1435) after enzyme treatment. An altemative strategy termed methylation-specific PCR 
(Herman et al., 1996, Proc Nati Acad Sci USA., 93:9821), is used to determine the methylation status 
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of CpG islands without the use of methylation-specific restriction enzymes. 
Histone-Deacetylation 

Transcription of chromatin-packaged genes involves highly regulated changes in nucleosome 
5 structure that control DNA accessibility. Changes m nucleosome strucmre can he mediated by 
enzymatic complexes which control the acetylation and deacetylation of histones. Transcription 
elongation is required for the formation of Ihe unfolded structure of transcribing nucleosomes, and 
histone acetylation is required for the maintenance of these structures (Walia et al., 1998, J . Biol. 
Chem. , 3 :145 16). Deacetylation can be prevented by incubating cells with histone deacetylase 
10 inhibitors such as sodium butyrate or trichostain A. To assay for changes in acetylation and the state 
of transcriptional activity, chromatin fractions are purified using organomercury and hydroxylapatite 
dissociation chromatographic techniques (Walia et al., supra). 

iL TraiKcription Start Site 

To determine if a particular polymorphism causes a change in the transcriptional start site of a 
candidate gene Sl.nuclease mapping and primer extension can be performed. The presence of a 
polymorphism may cause an mRNA to be aberrantly expressed. In particular, a polymorphism may 
change the tissue specificity or developmental expression pattern of an mRNA species. A variety of 
molecular methods for detecting mRNA known in the art can be performed to determine the 
expression pattern of an mRNA These methods include, but are not limited to the foHowing: Northern 
blot analysis, RT-PCR, SI analysis, RNase Protection analysis, or in situ hybridization analysis of 
sections, wherein the samples are derived from multiple different tissues or from a tissue at different 
stages of development. Northern blot analysis, RT-PCR and SI analysis can also be used to determine 
if a polymorphism results in an altered pattern of mRNA splicing. 

Northern-Blotting 

The mefliod of Northern blotting is well known in the art. This technique involves the tiransfer 
of RNA from an electrophoresis gel to a membrane support to ahow the detection of specific 
sequences in RNA preparations. 

Northern blot analysis is performed according to the followmg mefliod. An RNA sample 
(prepared by the addition of MOPS buffer, formalddiyde and foxmamide) is separated on an 
agarose/formaldehyde gel in IX MOPS buffer. Following staming with eflridium bromide and 
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visualization under ultra violet light to determine the integrity of the RNA, the RNA is hydrolyzed by 
treatment with 0.05M NaOH/1.5MNaCl foUowed by incubation with 0.5M Tris-Cl (pH 7.4)/1.5M 
NaCL The RNA is transferred to a commercially available nylon or nitrocellulose membrane (e.g. 
Hybond-N membrane, Amersham, Arlington Heights, IL) by methods well known in the art (Ausubel 

5 et al, supra, Sambrook et al., supra). Followiag transfer and UV cross linking, the membrane is 
hybridized with a radiolabeled probe in hybridization solution (e.g. in 50% formamide/2.5% 
Dehhardt's/100-200mg denatured salmon sperm DNA/0. 1% SDS/5X SSPE) at 42°C. The 
hybridization conditions can be varied as necessary as described in Ausubel et al., siipra and 
Sambrook et aL, supra. Following hybridization, the membrane is washed at room temperature in 2X 

10 SSC/0.1% SDS, at 42°C in IX SSC/0.1% SDS, at 65°C ia 0.2X SSC/0.1% SDS, and exposed to film. 
The stringency of the wash buffers can also be varied depending on the amoimt of background signal 
(Ausubel et al., supra). 

RNase Protection Analysis 

15 RNase Protection analysis can be used to analyze RNA structure and amount and determine 

the endpoint of a specific RNA. 

The method of RNase protection is more sensitive than SI analysis since it utilizes a sequence , 
specific hybridization probe that is labeled to a high specific activity. The probe is hybridized to sample 
RNAs and treated with ribonuclease to remove free probe. Following ribonuclease treatment, the 

20 fragments comprisiDg probe annealed to homologous sequences in the sample RNA are recovered by 
ethanol precipitation, and analyzed by electrophoresis on a sequencing geL The presence of the target 
mRNA is indicated by the presence of an appropriately sized fragment of the probe. 

A probe is labeled by the method of in vitro transcription (in the presence of [a-^^P] CTP as 
described in Section B entitled 'Troduction of a Polynucleotide Sequence". The RNA sample to be 

25 analyzed is ethanol precipitated and resuspended in 30ml hybridization buffer (4 parts formamide/l 
part 200 mM PIPES, pH 6.4, 2 M NaCl, 5 mM EDTA) containing 5 x 10^ cpm of the probe RNA. 
The mixture is denatured 5 minutes at 85°C and incubated at the desired hybridization temperature 
(30°C to 60°C) for >8 hours. To each reaction mixture is added 350 ml ribonuclease digestion buffer 
(10 mM Tris-Cl, pH 7.5 , 300 mM NaCl, 5 mM EDTA) containing 40 mg/ml ribonuclease A and 2 

30 mg/ml ribonuclease Tl. The sample is incubated for 30-60 minutes at 30°C. Following the addition of 
10 ml 20%SDS and 2.5ml 20 mg/ml proteinase K, the sample is incubated for 15 minutes at 37°C. The 
sample is extracted with phenol /cblorofomilisoamyl alcohol, ethanol precipitated, resuspended in RNA 
loading buffer (80% (v/v) formamide, 1 mM EDTA, pH 8.0, 0.1 % bromophenol blue, 0.1 % xylene 
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cyanol), denatured and analyzed by electrophoresis on a denaturing polyacrylamide/urea gel and 
autoradiography (Ausubel et al., supra). 

Primer Extension 

The method of primer extension is used to map the 5' end of an RNA and to quantitate the 
amount of an RNA of interest by using reverse transcriptase to extend a primer that is complementary 
to a region of a given RNA. 

An oligonucleotide primer is labeled in a kinase reaction as described for SI analysis. The 
primer extension reaction is performed by mixing 10-50 mg total cellular RNA (in 10ml) with 1.5ml 
lOX Hybridization buffer (1.5M KCl, O.lMTrisCl, pH 8.3, lOmMEDTA) and 3.5 ml labeled 
oligonucleotide. Samples are heated to eS^C for 90 minutes and allowed to slow cool at room 
temperature. To each sample is added 30 ml of primer extension reaction mixture (0.9 ml Tris-Cl, pH 
8.3, 0.9 ml 0.5M MgClj, 0.25 ml DTT, 6:75 ml 1 mg/ml actinomycin D, 1.33 ml 5 mM 4dNTP mix, 20 
ml H20, 0.2ml 25 U/ml AMY reverse transcriptase). Samples are incubated for 1 hour at 42°C, and 
then, following the addition of 105 ml RNase reaction mix (100 mg/ml salmon sperai DNA, 20 mg/ml 
RNase A) for 15 minutes at 37°C. Samples are extracted in phenol/chloroformlisoamyl alcohol, 
ethanol precipitated, resuspended in stop/loadmg dye (20 mM EDTA, pH 8.0, 0.05% bromophenol v 
blue, 0.05% xylene cyanol m formamide), heated at 65°C and analyzed by electrophoresis on a 9% 
acrylamide/7M urea gel and autoradiography. 

In Situ Hybridization 

Cytological techniques well Imown in the art can be used to determine the temporal and spatial 
ejqpression patterns of mRNA (in situ hybridization of tissue sections) and protein 
(hnmunohistochetnistry in individual cells). 

PreparatioT i of bistf^l ogical samples . 

Tissue samples intended for use in in situ detection of either RNA or protein are fixed usmg 
conventional reagents; such samples may comprise whole or squashed cells, or sectioned tissue. 
Fixatives useful for such procedures include, but are not limited to, formalin, 4% paraformaldehyde in 
an isotonic buffer, formaldehyde (each of which confers a measure of RNAase resistance to the 
nucleic acid molecules of the sample) or a multi-component fixative, such as FAAG (85 % ethanol, 
4% formaldehyde, 5% acetic acid, 1% EM grade glutaralddiyde). For the detection of RNA, water 
used in the preparation of an aqueous component of a solution to which the tissue is exposed until it is 
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embedded is RNAase-free, i.e. treated with 0.1% diethylprocaxbonate (DEPC) at room temperature 
overnight and subsequently autoclaved for 1.5 to 2 hours. Tissue will be fixed at 4°C, either on a 
sample roller or a rocking platform, for 12 to 48 hours in order to allow the fixative to reach the center 
of the sample. 

5 Prior to embedding, excess fixative will be removed and the sample will be dehydrated by a 

series of two- to ten-minute washes in increasingly high concentrations of ethanol, beginning at 60% 
and ending with two washes in 95% and another two in 100% ethanol, followed by two ten-minute 
washes in xylene. Samples wiUbe emibedded in one of a variety of sectioning supports, e.g. paraffin, 
plastic polymers or a mixed paraffin/polymer medium (e.g. Paraplast®Plas Tissue Embedding 

10 Medium, supplied by Oxford Labware). For example, fixed, dehydrated tissue will be transferred from 
the second xylene wash to paraffin or a paraffin/polymer resin in the liquid -phase at about 58°C. The 
paraffin or a paraffin/polymer resin will be replaced three to six times over a period of approximately 
three hours to dilute out residual xylene. The sample wiUbe incubated overnight at 58°C under a 
vacuum, in order to optimize infiltration of the embedding medium into the tissue. The next day, 

15 following several additional changes of medium at 20 minute to one hour intervals, also at 58°C, fhe 
tissue sample will be positioned in a sectioning mold, the mold will be sutxounded by ice water and fhe 
medium will be allowed to harden. Sections of 6mm tiuckness will be taken and affixed to 'subbed' 
sHdes, which are slides coated with a proteinaceous substrate material, usually bovine serum albumin 
(BSA), to promote adhesion. Other methods of fixation and embedding are also applicable for use 

20 according to the metitiods of the invention; examples of these are found in Humason, G.L., 1979, 
Animal Tissue Techniques, 4th ed. (W.H. Freeman & Co., San Fransisco), as is firozen sectioning 
(Serrano et al., 1989, supra). 

In situ Hybridization Analysis 

25 According to the method of in situ hybridization a specifically labeled nucleic acid probe is 

hybridized to cellular RNA present in individual cells or tissue sections. In situ hybridization can be 
performed on eiflier paraf&n or fi-ozen sections. Depending on fhe desired sensitivity and resolution, 
either film or emulsion autoradioagraphy can be utilized to detect fhe hybridized radioactive probe. 

The following method of in situ hybridization is performed by incubating slides containing cell 

30 or tissue specimens in a slide rack contained within a glass staining dish. According to this method, it is 
preferable to use solutions that have been prepared ftesh. Prior to the hybridization steps, slides are 
dewaxed to remove the sectioning support material. The dewaxing protocol involves sequential 
washes in xylene, rehydration by sequential washes in. 100%, 95%, 70% and 50% ethanol, and 
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denatiiration in 0.2N HCl. Following a heat denaturation step (70°C ia 2X SSC), samples are postfixed 
in a freshly prepared solution of 4% PFA, washed in PBS, mcubated in 10 mM DTT (10 min at 45''C) 
and blocked in 400 ml PBS containing 0.617g DTT, 0.74 g iodoacetamide and O.Sg N-ethylmaleimide, 
for 30 min at 45°C in a water bath covered with aluminum foil, due to the light sensitivity of 
iodoacetamide and N-ethylmaleimide. The samples are washed in PBS and equilibrated sequentially in 
freshly prepared 0. IM trietlaanolamine (TEA buffer), TEA buffer/0.25% acetic anhydride, and TEA 
buffer/0.5% acetic anhydride. Following a blocking step in 2X SSC, the sample are dehydrated by 
sequential washes in 50%. 70%, 95%, and 100% ethanol and air dried. ^^S-labeled riboprobes and 
competitor probes prepared in the absence of a radiolabel (prepared as described in Section B entitled 
'Troduction of a Polynucleotide Sequence") or double-stranded DNA probes (prepared with 
p5S]dNTPs by methods well known in the art including nick translation or random oligonucleotide- 
primed synthesis) are heated to I00°C for 3 min and diluted to a concentration of 0.3 mg/ml final probe 
concentration, in 50% formamide, 0.3M NaCl, lOmM TrisCl, pH 8.0, 1 mM EDTA, he Denhardt 
solution, 500 mg/ml yeast tRNA, 500 mg/ml poly(A) (Pharmacia). 50 mM DTT, 10% polyethylene 
glycol (MW 6000). The hybridization step is carried out by covering the sample with an appropriate 
amount of probe, and incubating for 30 min to 4 hour at 45°C in a chamber designed to prevent dilution 
or concentration of tiie hybridization solution. Samples are washed sequentially at 55°C in solution A 
(50% (v/v) formamide, 2X SSG, 20 mM 2-mercaptoethanol), and solution B (50% (v/v) formamide, 
2X SSC, 20 mM 2-mercaptoethanol, 0.5% (v/v) Triton-X-100) and at room temperature in solution G 
(2X SSC, 20 mM 2- mercaptoethanol). Following a 15 minute incubation with KNase, samples are 
washed at 50"C jn solution C, and at room temperature in 2X SSC. Samples are rehydrated by 
sequential washes m 50% ethanoVO.SM ammonium acetate. 70% ethanol/0.3M ammonium acetate, 
95% ethanoV0.3M ammonium acetate, and 100% ethanol. Slides are air dried and analyzed by film or 
by emulsion autoradiography (Ausubel et al., supra). 

iii. inRNfAStabjHty/Control of Turnover and mRNA Transcription Rate 

Changes in mRNA stability/control of turnover and mRNA transcription rates due to the 
presence of a polymorphism, can be detected by the following mefliods. 

0 

TnT^NA Stability . ' , 

Gene-expression can be regulated by variations in mRNA stability (liebhaber, 1997, Nucleic 
Acids Symp Ser., 36:29 and Ross J. 1996, Trends Genet., 5:171). Any gene variation occurring witiiin 
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the cis-acting elements which control mRNA abundance may inflnence gene expression levels (Peltz 
et al., 1992, Curr Opin Cell Biol., 4:979). Quantitative RT-PCR (Kohler, et al, 1995, Quantitation of 
mRNA.hy polymerase chain reaction. Springer) and mRNA radiolabeDing techniques are two methods 
for measuring relative mRNA abundance and stability. Quantitative PGR employs an internal standard 
5 to provide a direct comparison between alternative reactions, enabling comparison of low abundance 
transcripts or trMiscripts derived from a sample that is only available in a limited quantity (McPherson 
MJ et al., eds, 1995, PCR2- A practical approach. IRL Press). 

Assay for mRNA Transcription Rates 

10 Genetic polymorphism within the regulatory regions of a gene can significantly alter 

transcription rate and mRNA stability, resulting in reduced biological activity of the encoded protein. 
One of the most sensitive assays for measuring the rate of gene transcription is the nuclear runoff 
assay (Groudine and Casimir, 1984, Nucleic Acids Res 12: 1427). Nuclei isolated from cell lines 
expressing the target gene of interest are treated with radiolabelled "DTP and the level of incorporation 

15 of radiolabel into nascent RNA transcripts is determined by filter hybridization to immobilized cDNA 
derived from the target gene. 

iv. Intracellular mRNA Localization 

A genetic variation can cause a change in the localization of a particular mRNA species (e.g. 
20 to the cytoskeleton, or to the nuclear scaffold). 

Immunohistochemisitrv 

Changes in RNA localization can be detected by immunohistochetnical methods well known in 
the art (e.g. in situ analysis described above). 

25 

Oocyte Injection Assays 

In many cases mRNA, like protein, is localized in relation to the polarity of flie cell or the 
cytoskeletal architecture (St. Johnston, 1995, Cell, 81:161). Th& Xenopus oocyte is a popular, . 
experimentally tractable, system for studying intracellular trafficking of mRNA (Nakielny et al., 1997, 
30 Annu. Rev. Neurosci., 20:269). Huorescently labelled RNA is microinjected into the large oocyte cell 
where its location can be detected using standard microscopy methods. Polymorphic variants of a 
particular mRNA species may differ in their respome to cellular mechanisms responsible for 
partitioning mRNA within the cell. This melhod has been useful for demonstrating that sequence 
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variations can affect sub-ceHular localization (Grimm et al, 19973MBO J., 16:793) 



V. Post-Translational Alterations 

Post-Translational alterations resulting from premature stop codons, translational readthroxigh 
or multiple open reading frames and translational suppression may occur as a result of a 
polymorphism. To detect post-translational alterations, a polynucleotide comprising one or more 
polymorphisms is subjected to in vitro transcription and in vitro tianslation (as described in sections B 
and J entitied "Production of a Polynucleotide Sequence" and '^Preparation of a Labeled Protein"). 
The translation prodnct(s) are analyzed for the appearance of aberrantly sized proteins. Additional 
post-translational alterations that may occur as a result of a polymorphism include changes in 
localization due to an altered signal sequence, and changes in glycosylation, myristilation, and 
susceptibility to or sites of proteolytic cleavage. 

The method of immunocytochemistry can be used to determine if a protein is incorrectly 
localized, due to tibe presence of an altered signal sequence. 

Tmmn-no histochemistry 

Immunohistochemical techniques including indirect immunofluorescence, hnmunoperoxidase 
labeling or immunogoH labeling, are used for protein localization. 

Immunofhaorescent labeling of tissue sections (prepared as for in situ analysis, described 
above) is performed by the following method. SHdes dontaining the sample of interest are equiHbrated 
to room temperature washed in PBS, mcubated with an appropriate dilution of primary antibody (1 
hour at room temperature), washed in PBS, incubated witii an appropriate dilution of secondary 
antibody (1 hour at room temperature), washed in PBS and analyzed under a microscope (Ausubel et 
al. , supra). Alternatively, the sensitivity of the immunohistochemical reaction is mcreased by using a 
streptavidin-secondary antibody conjugate reacted witii a biotinfluorochrome conjugate. Alternatively, 
immunogold labeling is used to detect a protein of interest by using an hmnunogold-conjugated 
secondary antibody. 

hnmunoperoxidase labelmg of tissue sections is performed by the following metiiod. SHdes are 
pretreated in 0.25 % hydrogen peroxide, incubated with primary antibody, washed in PBS and 
) incubated (1 hour at room temperature) with a specific secondary bridging antibody capable of 

recogoizingboth the primary antibody and a Horseradish peroxidase antiperoixidase (PAP) complex. 
The slides are washed in PBS and developed in diaminobenzidene substrate solution (0.03% (w/v) 3,3' 
diaminobenzidene in 200 ml PBS) at room temperature (Ausubel et al, supra). 
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Alternatively, protein localization is determined by ceU fractionation wherein cells are 
biosynthetically labeled, the labeled material is fractionated, and the radiolabeled proteins in each 
fraction are analyzed by immunoprecitation with an antibody specific for the protein of interest. 

5 Assay for Glycosylation Inhibition 

Changes in protein glycosylation can be detected by radiolabeHing a protein of interest with 
sugars, detenrdning if a change in the cellular localization (by immunocytochemistry) of the protein in 
culture has occurred due to aberrant glycosylation, or by determining the effects of inhibitors of 
glycosylation on the migration pattern of proteins analyzed by polyacrylamide gel electrophoresis. 

iO Post-translational glycosylation of proteins plays an important role in defining protein function 

(Baeziger, 1994, FASEB J., 13:1019; Jacob, 1995, Curr. Opin. Struct. BioL, 5:605). Protein 
glycosylation can be inhibited by tunicamycin, an antibiotic, as well as by several sugar analogues 
(Schwarz, 1991, Behring Inst Mitt., 89:198). These reagents are used to characterize the effects of 
sequence changes on protein glycosylation. 

15 

Assay for Post-Translational Modification with Lipids 

Changes in protein modification with lipids (e.g. myristilation) are detected by radiolabelhng a 
protein of interest with myristic acid or by determining if a change in the cellular localization of the 
protein in culture has occurred as a result of aberrant lipid modification (by immunocytochemistry). 

20 Covalent attachment of lipids is a mechanism by which eukaryotic cells direct and, in some 

cases, control, membrane localization of proteins (Casey, 1994, Curr. Opin. Cell. Biol., 2:219). Such 
post-translational addition of myiistyl, palmityl or prenyl side-chains has a key role in the functional 
regulation of many proteins (Chow et al., 1992, Curr. Opin. Cell. Biol., 4:629; Resh, 1994, Cell, 
763:411). Assays for detecting proteins tliat are covalently modified by the attachment of lipids include 

25 labeling with pBTlmyristate (Stevenson et al., 1992, J. Exp. Med., 176:1053), or a combination of 
enzymatic and chemical cleavage techniques performed in conjunction with tandem mass 
spectrometry to determine sites of modification (Papac et al, 1992, J. BioL Chem., 267:16889). 

Proteolytic Cleavage 

30 Post-translational cleavage of polypeptides is an important mechanism for modulating protein 

function in many physiological processes. Protease activity is involved in zymogen processing, 
activation of enzyme catalysis, tissue/cell remodeling, signal transduction cascades, protein degradation 
and cell death pafliways (Rappay, 1989, Prog Efotochem Cytochem., 18:1). A protein that is predicted 
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to be a protease or the target of a protease can be assayed in vitro using purified proteins or cell 
extracts (Muta et al., 1995, J. Biol. Chem. 270:892) where cleavage efficiency is monitored by 
standard PAGE or western blotting. Alternatively, proteases and/or their targets can be expressed 
from expression plasmids in in vivo cell culture systems in order to monitor their biological activity 
5 (Zhang, et al., 1998, J. Biol. Chem. 273:1144). The specificity of proteolytic cleavage is determined 
using inhibitors that selectively block seine, cysteine, aspartic and metallo proteolytic activity (e.g. 
pepstatin A selectively inhibits aspartic proteases) (Rich, et al., 1985, Biochemistry., 24: 3165). 

To determine if a protein has been modified such that the sites of proteolytic cleavage have 
been altered, or susceptibility to proteolytic cleavage has changed pulse chase experiments with 
10 radiolabeled protein can be carried out to determine the precursor-product relationship following 
digestion with a protease of a given specificity. The method of pulse chase labeling is described in 
Ausubel et al., supra. Alternatively, inhibitors of proteases (e.g acid proteases or seine proteases) can 
be used to identify protease cleavage sites. 

15 vi. Changes in Receptor Properties 

If the gene of interest encodes a receptor protein, a polymorphism may modify Ihe properties 
of the receptor such that receptor binding/turnover or activation is altered. Receptor formation can be 
impaired if a polymorphism causes improper receptor localization or assembly. 

20 Receptor Localization 

To determine if a receptor protein is being expressed at the proper location (e.g. nucleus, 
cytoplasm, cell surface), the receptor can be localized by immunocytochemical techniques. 
Alternatively, cells that are expressing the receptor can be firactionated and subjected to Western blot 
analysis or biosynthetically labeled, fractionated and analyzed by immunoprecipitation. 

25 

Proteia-Protein Interactiotis/M vitro Assembly A ssays for Receptors 

A number of methods can be used to determiae if a receptor is colocalized with the 
appropriate protein partner. 

The fimction of a p' osei;. may be dependent on the ability of the protein to mteract with other 
30 proteins as part of a large cc r -aiex. For example, certain ceU surface receptors consist of a receptor 
complex that is composed of several homo- or heteromeric protein subunits, and activation by ligand 
can result in altered protein-protein interactions bolh withia the receptor complex and with 
"downstiream" targets such as G-proteios (Okada and Pessin, 1996, J . Biol. Chem., 271:25533). 
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Protein-protein interactions can be assayed immunologicaUy by coimmunoprecipitation of native 
(Gilboa et al., 1998, J. Biol. Chem., 140:767) or chemically cross-linked complexes (Haniu et aL, 1997, 
J. Biol. Chem., 272:25296), or through protein-protein mobility shift assays (Stern and Frieden, 1993 , 
Anal. Biochem., 212:221). If aU of the components of a receptor complex have been identified, one 
5 can employ in vitro reconstitution assays to assess whether a single protein alteration can effect the 
functioning of the entire complex (Durovic et aL, 1994, J. BioL Chem., 269:30320). 

Assav for In Vitro Assembly of Multimeric Protein Complexes 

To determine wheflier these genetic variations have affected protein complex assembly, 
10 experiments are carried out wherein recombinant mutant subunits are transfected into cells and 
coexpressed with the other subunit components in vitro. Proper assembly is assessed by 
immunoprecipitation of the protein complex in question with antibodies specific for the various 
members of the complex followed by PAGE analysis (Koster et al., 1998, Biophysl. J., 74:1821). 

15 Assav Receptor Binding/Turnover 

Receptor-Hgand interaction is essential for the functionality of the bound complex. Genetic 
changes that alter either ligand or receptor can dramatically affect receptor binding, turnover, and 
subsequent activation of downstream signaling events. Receptor binding/turnover can be measured by 
standard Scatchard analysis of radiolabelled ligand binding in vitro (Cnlouscou et al., 1993, J. Biol. 

20 Chem. 268:10458) or in cellular based assays (Greenland et al., 1993, J. Biol. Chena. 268: 18103). 

ligand Binding as Measured by Affinity Chromatoexaphy 

Alternatively, affinity chromatography methods (well known in the art) can be employed to 
determine if a receptor is demonstrating aberrant binding characteristics. According to the method of 

25 affinity chromatography, receptor-ligand interactions are allowed to occur, and the binding efficiency 
or receptor and ligand and/or turnover of receptor-ligand complexes is measured. Alternatively, 
affinity chromatography can be used to isolate one or more components of a receptor ligand 
interaction for further analysis (March et aL, 1974, Adv. Exp. Med. BioL, 42:3). The method of affinity 
chromatography typically involves immobilizing on a solid support one component, for example a 

30 known ligand for a receptor, and then incubating the immobilized ligand with radiolabelled protein 

nnder optimal binding conditions. To measure the exact binding affinity of a given ligand-receptor pair, 
an increasing amount of non-labeled competitor is added. This assay can be used to assess altered 
binding efficiency resulting fi^om the presence of a polymorphism in a protein of interest. 
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Receptor Activation Assays: Phosphorylation. Kinase Activity and Mitogenic Stimulation 

Almost all signaling that occurs through cell surface receptors is regulated by phosphorylation, 
a reversible post-translational event that occurs at specific amino acid residues and is catalyzed by a 
protein kinase activity present within the receptor itself (autophosphorylation) or in trans via direct 
5 interaction with an associated kinase (Hunter, 1997, Philos Trims R Soc Lond B Biol Sci., 353:583). 
The specific effect of phosphorylation on a biological activity depends on the receptor, but often 
results in modulation of endogenous receptor kinase activity or interaction with associated proteins , 
which are also often kinases. The results of a phosphorylation event are passed on flirongh a cascade 
of protein kinases/phosphatases which ultimately effect downstream processes controlling gene 
10 transcription, cell proliferation, metabolism, movement and differentiation (Patarca, 1996, Crit Rev 
Oncog., 7:343). The biological function of a receptor is usually assayed in cell culture followiug over- 
expression. The phosphorylated state of a receptor can be assayed directly by immunological methods 
by employing an antibody that specifically recognizes a phosphorylated residue (Bangalore, 1992., 
Proc Natl Acad Sci USA., 89:11637). Endogenous kinase activity associated with a receptor is 
15 measured via the incorporation of radiolabelled phosphate in immunoprecipitated receptor complex 
(Kazlauskas and Cooper, 1989, Cell 58:1121). 'T)ownsti:eam" events of receptor activity including 
mitogenic stimulation or map kinase activity, can be measured by tritiated thymidine iacorporation (Lup 
et kL, 1996, Cancer Res. 56:4983), or by mobility-shift analysis of map kinase on western blots (Vietor, 
1993., J. Biol Chem. 268:18994), respettively. 
20 Immnnocytochemical methcjds can be used to determine if a receptor-ligand complex is 

correctly tiranslocated to the nucleus. Alternatively, nuclear preparations (prepared as described 
below) can be analyzed by Western blot or inamunoprecipitation for the presence of the receptor 
protein. 

If a receptor is a transcriptional activator, the ability of the receptor to induce gene expression 
25 can be measured by a variety of methods includihg Northern blot analysis, or reporter gene assays 
.wherein the promoter region isolated from a gene that is activated by the receptor regulates the 
expression of a reporter protein. 

vii. Enzyme Catalysis 

30 The gene of interest may encode a protein that has an enzymatic activity wherem the enzyme 

catalyzes a reaction that is critical to the general metabolism of a cell. To determine if a mutated 
protein is impaired in its enzymatic function, assays can be performed to measure the enzymatic 
activity of the protem. There are many important enzymatic activities associated with normal cellular 
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metabolism, including: glycosidation, esterificition, amidation, liydroxylation, acetylation, sulfonylation, 
alkylation. Each of these activities are assayed using in vitro methods employing overexpressed or 
purified proteins, wellknown in ihe art (Eisenthal and Danson, 1992, Enzyme Assays: A Practical 
Approach, Rickwood et al., Eds., IRL Press. Oxford, England). 

5 The protein of interest may also be involved in various aspects of DNA synthesis or 

replication. In vitro assays for the enzymatic reactions involved in DNA synthesis or replication (e.g. 
polymerase, ligase, exonuclease or helicase activity) are known in the. art. The biological activity of the 
proteins catalyzing these activities are assayed in vitro using standard enzymatic techniques (Adams, 
199, DNA Replication: A Practical Approach I, Rickwood, et al., Eds., IRL Press. Oxford, England). 

10 If the protein of interest is involved in glycolysis or energy transport, assays for measuring 

transporter activity or the activity of ATP dependent pumps are useful, according to the invention, for 
determining if a mutated protein is impaired in these functions. 

Transporter Activity 

15 Mammalian cells possess a variety of transporter systems, for example amino acid 

transporters, which have overlapping substrate specificity (Van Winkle et al., 1993, Biodhim Biophj^ . 
Acta, 1154:157). To determine if a polymorphism in a candidate gene of interest has altered the .,; 
function of the protein product of this gene as a molecular transporter, the ftiU-length cDNA clone is 
isolated by standard expression cloning strategies, and a change in activity of tiie full-length cRNA or 

20 antisense cRNA upon microinjection into Xenopus laevis oocytes is determined by measuring changes 
in influx/efHux transport of radiolabelled amino acid molecules (Broer et al., 1995, Biochem J., 312(Pt 
3):863), neurotransmitters or their metabolites. 

ATP-dependent pumps Activity 

25 Mammalian cells possess a variety of molecules that are categorized as ATP-binding cassette 

or ATP-dependent transporters or pumps. These include the Na+-K*-ATPase ion pump, the calcium 
uptake pump, (K* + H+)-ATPase and the human multidrug resistant protein termed P-glycoprotein. 
Alterations in pump activity are investigated by expressing the clone specific for the pump protein(s) 
of interest in Xenopus oocytes, and performing tracer studies which measure the changes in ATP- 

30 dependent uptake or extrusion of a radiolabelled substrate, and changes in the coupling ratios (e.g. 
moles substrate transported/mole ATP hydrolyzed) (Shapiro et al., 1998, Eur, J. Biochem., 254:189). 

viii. Ion Channel 
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The gene of mterest may encode for a protem that is a component of an ion channel. 
Immunocytochemical methods can be used to determine if an ion channel protein demonstrates the 
appropriate cell type specificity. 

The activity of an ion channel can be measured by electrophysiological methods in oocytes. 
5 Alternatively, the sensitivity of ion channel activity to a particular inhibitor can be determined. 

Assays for Ton Chann el Activity in Oocytes 

Polymorphisms which alter ion channel function and regulation are studied using the oocytes 
ofXenopus laevis. Injection of the oocytes with exogenous m viti'o transcribed mRNA results in the 

10 production and ftmctional ejqpression of foreign membrane proteins, including voltage- and 

neurotransmitter- operated ion channels (Dascal et al. , 1987 . , CRC Crit Rev Biochem. , 224:3 17). 
Changes in the oocyte transmembrane current in response to expression of an exogenous mRNA is 
meas\tted. This technique has been improved by the development of rapid superfusion systems that 
utilize a dual role perfusion micropipette that controls internal solution as well as monitoring voltage 

15 (Costa et al., 1994, Biophys J., 67:395). This technology represents a useful system for studying 

various aspects of ion channels encoded for by foreign mRNAs including channel ejcpression, single- 
channel behavior, and the response of channels to the action of pharmacologically active substances 
(Sigel, 1987 J. PhysioL, 386: 73). 

20 Patch Clamp Assays for Ion Cb at^tiftl Activity 

The function of individual channel proteins is determined by the high resolution patch clamp 
technique. This technique (which is useful in a variety of cell types, including Xenopus oocytes 
described above) involves measuring changes in transmembrane current across the cell membrane in 
vitro (Sachs et al., 1983, Methods Enzymol., 103: 147). Processes such as signalmg, secretion, and 
25 synaptic transmission are examined at the cellular level by flie patch clamp method. The gene 

expression pattern and protem structure of ionic channels can be determined by combining information 
derived jfrom high-resolution electrophysiological recordings obtained by the patch clamp method with 
molecular biological analysis (liem et al., 1995, Neurosurgery, 36: 382). 

A polymorphic variation in a gene that encodes a protein that is a member of a multimeric 
30 protein complex, such as an ion channel or a cytoskeletal structural component, can alter the assembly 
and function the multimeric protem complex (Lee et aL, 1994., Biophys J., 66: 667). A gene variation 
may affect protein-protein interaction, or dismpt the production of components of a mnltimeric 
complex, thereby disrupting stoichiometiy and consequently decreasing stability. 
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Assay for In Vitro Assembly of Multimeric Protein Complexes 

III vitro assembly assays (described above) can be performed to determine if a polymorphism 
has affected the assembly of an ion channeL 

5 ix. Cellular Properties 

The influence of a polymorphism on general aspects of cell behavior, inchiding cell 
morphology, adhesive properties, differentiation and proliferation can be assessed using a combination 
of methods inchiding microscopic observation of cell cultares (Azuma et al., 1994, HistoLHistopathol., 
9:781), immunohistocbemistry, and FACs analysis techniques (Beesley, 1993, Immunocytochemistry: a 

10 Practical Approach, Rickwood, et al., (Eds), IRL Press and Ormerod, 1994, How Cytometry: a 
practical Approach, Rickwood et al., (Eds), IRL Press. Oxford, England). 

Assays for Measuring Apoptosis 

Apoptosis has been implicated in the etiology and pathophysiology of a variety of human 

15 diseases. Gene variants which influence the process of apoptosis can be assessed by a variety of 
methods of analysis involving either the tissues or cells (Allen et aL, 1997, J Pharmacol Toxicol 
Methods, 37: 215). Cell cultures expressing the gene variants of interest are analyzed using Annexin V : 
which interacts strongly with phosphatidylserine residues that have been exposed as a result of plasma - 
membrane breakdown occurring in the early stages of apoptosis. Either Adtal or fixed material can be 

20 analj^ed by Annexin V labeling in combination with microscopy and flow cjrtometry detection 
methods (van Engeland et al„ 1998, Cytometry, 31:1). TdT-mediated deo^QOiridine triphosphate 
(dUTP)-biotin nick end-labeling (TUNEL) is a preferred method for specific staining of apoptotic cells 
in histological sections and cytology specimen (Labat-Moleur et al, 1998, J. Histochem Cytochem., 
46:327; Sasano et al., 1998., Diagn Cytopathol.,18:398). Apoptosis is also detected by quantification of 

25 DNA fi-agmentation by ethidium bromide staining and gel electrophoresis, or by the use of saturation 
labeling of 3 ' ends of DNA firagments (Peng and Liu, 1997, Lab Invest., 77:547). 

Assay for In Vivo Receptor Function: Growth Cone Guidance Assay 

Activation of ceU-surface receptors can result in the stimulation of cell motility. There are 
30 many different families of signaling molecules, for example the netrins, (Serafini et al., 1994, CeU. 78: 
409), which are responsible for both contact mediated or chemo-mediated attraction and repulsion of 
naigrating ceBs. A classic model for this activity is the trajectory that the leading edge "growth cone" 
takes when a neuron is stimulated to grow out from explanted neural tissue in cell culture (Goodman, 
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1996, Annu Rev Neurosci. 19: 341). Ligands present in the culture medium or immobilized on a 
substrate bind to receptors on the cell-surface of the growth cone and trigger second-messenger 
signals thereby dictating an appropriate steering response. The biological activity of such receptors or 
Hgands can be measured by overexpressing the receptor or ligand protein in culture and then 
5 monitorihg growth cone guidance (Kremoser et al, 1995, CeU 82: 359). Attraction or repulsion of cells 
wMch is observed to be different than normal is an indication of the role of this protein in growth 
guidance, and identifies the polymorphisms as altering function. 

X. Changes in gene expression or protein function that result from the presence of a 
10 polymorphism can be detected by in vivo assays including the production of transgenic animals, Imock 
out animals or the analysis of naturally occurring animal models of a particular disease. 



Transgep ic- Animals 

Transgenic mice provide a useful tool for genetic and developmental biology studies and for 
the determination of i function of a novel sequence. According to the method of conventional 
transgenesis, additional copies of normal or modified genes are injected into the male pronucleus of the 
zygote and become integrated mto the genomic DNA of the recipient mouse. The transgene is 
traaasmitted in a Mendelian manner in established transgenic strains. 

Constructs useful for creating transgenic animals comprise genes under the control of either 
their normal promoters or an inducible promoter, reporter genes under the control of promoters to be 
analyzed wifli respect to flieir patterns of tissue expression and regulation, and constructs containing 
dominant mutations, mutant promoters, and artificial fusion genes to be studied with regard to their 
specific developmental outcome. Transgenic mice are usefiil according to flie invention for analysis of 
the dominant effects of overexpressing a candidate gene in mouse. Typically, DNA fragments on flie 
order of 10 Idlobases or less are used to construct a transgenic animal (Reeves, 1998, New. Anat., 
253:19). Transgenic ammals can be created with a construct comprising a candidate gene containing 
one or more polymorphisms according to the invention. Alternatively, a transgenic animal expressing a 
candidate gene containing a single polymorphism can be crossed to a second transgenic animal 
expressing a candidate gene containing a different polymorphism and the combined effects of the two 
polymorphisms can be studied in the offspring animals. Transgenic mice engineered to overexpress a 
number of genes, including PCKl (Valera et al., 1994, Proc. Natl. Acad. Sci. USA, 91: 9151), DSTS 
(Mitanchez et al., FEBS Letters, 421: 285), lAPP (D'Alession et al., 1994, Diabetes, 43:1457). Asp 
(Klebig et aL, Proc. Nad. Acad. Sci. USA, 92: 4728) and Agrt (Graham et al.. Nature Genetics, 
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Knock Ou t Animals 

5 i. Standard 

Knock out animals are produced by the method of creating gene deletions with homologous 
recombination. This technique is based on the development of embryonic stem (ES) ceUs tliat are 
derived from embryos, are maintained in culture and have the capacity to participate in the 
development of every tissue in the mouse when introduced into a host blastocyst. A knock out animal 
10 is produced by directing homologous recombination to a specific target gene in the ES ceDs, thereby 
producing a null allele of the gene. The potential phenotypic consequences of this null allele (either in 
heterozygous or homozygous offspring) can be analyzed (Reeves, supra). Single or double knock out 
mice that may be useful for studying osteoarthritis have been produced for a number of genes 
including IRS 1 (Araki et al., 1994, Nature, 372:186, Tamemoto et al., 1994, Nature, 372:182), 1R52 
15 (Withers et al., 1998, Nature, 391 :900), INSR, BIRKO, MIRKO, INSR (Lamofhe et al., 1998, FEES 
Letter, 426:381), GLUT2, GLUT4 (Katz et al., 1995, Nature, 377:151), GLPIR (Gallwitz and Schmidt, 
1997, Z. Gastroenterol, 35:655):, GCK (Sakura et al., 1998, Diabetologia, 41:654), GCK/ERSl, 
mSl/INSR, MC4R (Huszar et aL, 1997, Cell, 88:13 1) andBRS3 (Ohki-Hamazaki et al., 1997, 
Nature, 390:165). 

ii. 7n vivo Tissue Specific Knock Out in Mice Using Cre-lox. 

The method of targeted homologous recombination has been improved by the development of 
a system for site-specific recombination based on the bacteriophage PI site specific recombinase Cre. 
The Cre-loxP site-specific DNA recombinase from bacteriophage PI is used in transgenic mouse 
assays in order to create gene knockouts restricted to defined tissues or developmental stages. 
Regionally restricted genetic deletion, as opposed to global gene knockout, has the advantage that a 
phenotype can be attaributed to a particalar cell/tissue (Marth, 1996, Clin. Invest. 97: 1999). In the Cre- 
loxP system one transgenic mouse strain is engineered such that loxP sites flank one or more exons of 
the gene of interest. Homozygotes for this so called *foxed gene' are crossed with a second 
transgenic mouse that expresses the Cre gene under control of a cell/tissue type transcriptional 
promoter. Cre protein then excises DNA between loxP recognition sequences and effectively 
removes target gene function (Sauer, 1998, Methods, 14:381). There are now many in vivo examples 
of this method, including the inducible inactivation of mammary tissue specific genes (Wagner et al., 
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1997, Nucleic Acids Res., 25:4323). 

iii.Bac Rescue of Knock Out Phenotype 

In order to verify tliat a particular genetic polymorphism/mutation is responsible for altered 
5 protein function in vivo one can "rescue" the altered protein function by introducing a wild-type copy 
of the gene in question. In vivo complementation with bacterial artificial chromosome (BAG) clones 
expressed in transgenic mice can be used for these purposes. This method has been used for the 
identification of the mouse circadian Clock gene (Antoch et al, 1997, Cell 89: 655). 

10 iv. Naturally Occurring Animal Models 

Naturally occurrmg animal models useful for studying osteoarthritis include models of severe 
hyperglycaemia (celebes black ape, chiaese hamster, diabetes mouse (db), Djunjarian hamster, 
Egyptian sand rat. Hartley guinea pig, OLETF rat. New Zealand white rabbit, obese BBZ/Wor rat, 
rhesus monkey, South AMcan hamster, spiny mouse), models for moderate hyperglycaemia (Cohen 

15 diabetic rat, GK rat, Japanese KK mouse, male Bristol CBA/Ca mouse, male eSS rat, male WKY 
fatty rat, male Wistar WBN/Kob rat, male ZDF rat, NZO mouse, obese mouse (ob), PBB/Ld mouse, 
spontaneously hypertensive corpulent (SHR/N^cp) rat, Tuco-tuco, Weflesley hybrid mouse, yeflow 
obese mouse) and impaired glucose tolerance (ageing laboratoiy rats and mice, BHE rat. Fatty Zucker 
rat (fa), Mongolian gerbil, NON diabetic mouse, squirrel monkey,- Yucatan miniature swine) (Pickup 

20 and Williams, eds.. Textbook of Diabetes, 2iid Edition, Blackwell Science). 



G. Production of an Amplified Product 

Amplified products useful accordmg to the invention can be prepared by utilizing the method 
of PGR as described in Section B entitled 'Troduction of a Polynucleotide Sequence Primers useful 
25 for producing an amplified product accordmg to the invention (e.g. an ampUfied product comprising 
one or more polymorphisms) can be designed and synthesized as described in Section A entitled 
"Design and Synthesis of Oligonucleotide Primers". 

The invention provides methods (e.g. Southern blot analysis, PCR, primer extension and 
oligonucleotide hybridization), of detecting a polymorphism in an amplified product. 

30 

H. Production of a Mutant Protein 

1. Expression of the Nucleotide Sequence 
94 



BNSDOCID: <WO ^030641 66A2J_> 



wo 03/054166 



PCT/US02/41225 



In. accordance with the present invention, polynucleotide sequences which encode candidate 
gene protein fragments, fusion proteins or functional equivalents thereof may be used in recombinant 
DNA molecules that direct the expression of a candidate gene protein in appropriate host ceDs. Due to 
the inherent degeneracy of the genetic code, other DNA sequences which encode substantially the 
5 same or a functionally equivalent amino acid sequence, may be used to clone and express the 

candidate gene protein. As will be understood by those of skiU in the art, it maybe advantageous to 
produce candidate gene-encoding nucleotide sequences possessing non-naturaUy occurring codons. 
Codons preferred by a particular prokaryotic or eukaryotic host (Murray et al., 1989, Nucleic Acid 
Res 17:477) can be selected, for example, to increase the rate of protein expression or to produce 
10 recombinant RNA transcripts having desirable properties, such as a longer half-life as compared to 
transcripts produced from the naturally occxuring sequence. 

The nucleotide sequences of the present invention can be engineered in order to alter a 
candidate gene-encodkig sequence for a variety of reasons, including but not limited to, alterations 
which modify the cloning, processing and/or expression of the gene product. For example, mutations 
15 may be introduced using techniques which are well known in the art, e.g., site-directed mutagenesis to 
insert new restriction sites, to alter glycosylation patterns, to change codon preference or to produce 
spKce variants. . 

In another embodiment of the invention, a natural, modified or recombinant candidate gene 
protein-encodiag sequence may be ligated to a heterologous sequence to encode a fusion protein (as ; 
20 described in Section B entitled 'Troduction of a Polynucleotide Sequence"). For example, for 
screening of peptide libraries for inhibitors of candidate gene protein activity, it maybe useful to 
encode a chimeric protein tihat is recognized by a commercially available antibody, a fusion protein 
may also be engineered to contain a cleavage site located between a candidate protein and the 
heterologous protein sequence, so that the protein of interest maybe substantially purified away from 
25 the heterologous moiety following cleavage. 

In another embodiment of the invention, the sequence encoding the candidate gene protein 
may be synthesized, whole or in part, using chemical methods well known in the art (see Caruthers, et 
al., 19S0, Nuc Acids Res Symp Ser, 7:215, Horn, et al., 1980, Nuc Acids Res Symp Ser, 225, etc.) 
Alternatively, the protein itself, or a portion thereof, could be produced using chemical metiiods of 
30 synthesis. For example, peptide synthesis can be performed using various solid-phase techniques 

(Roberge, et al., 1995, Science, 269:202) and automated synthesis maybe achieved, for example, using 
the A.I. 43 1 A Peptide Synthesizer (Perldn Elmer) in accordance with the instractions provided by the 
manufacturer. 
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The newly synthesized peptide can be substantially purified by preparative high performance 
liquid chromatography (e.g., Creighton, 1983, Proteins, Structures and Molecular Principles, WH 
Freeman and Co. New York NY). The composition of the synthetic peptides may be confirmed by 
amino acid analysis or sequencing (e.g., the Edman degradation procedure; Creighton, supra). 
5 Additionally the amiao acid sequence of interest, or any part thereof, may be altered during direct 
synthesis and/or combined using chemical mefliods with sequences firom other proteins , or any part 
thereof, to produce a variant polypeptide. 



2. Expression Systems 

la order to express a biologically active protein, the nucleotide sequence encoding the protein 
of interest or its functional equivalent, is inserted into an appropriate expression vector, i.e., a vector 
which contains ihe necessary elements for the transcription and translation of the inserted coding 
sequenced - 

Methods which are well known to those skilled in the art can be used to construct expression 
vectors containing a protein-encoding sequence and appropriate transcriptional or translational 
controls. These methods include in vivo recombiriation or genetic recombination. Such techniques are ; 
described in Ausubel et al., supra and Sambrook et al., supra. 

A variety of expression vector/host systems may be utilized to contain and express a protein 
product of a candidate gene according to the invention. These include but are not limited to ' 
microorganisms such as bacteria transformed with recombinant bacteriophage, plasmid or cosmid 
DNA expression vectors; yeast transformed with yeast expreission vectors; insect cell systems 
infected with virus expression vectors (e.g., baculovirus); plant ceU systems transfected with virus 
expression vector (e.g., cauHflower mosaic virus, CaMV; tobacco mosaic virus, TMV) or transformed 
wifli bacterial expression vectors (e.g., Ti or pBR322 plasmid); or animal cell systems. 

The "control elements" or "regulatory sequences" of these systems vary in their strength and 
specificities and are those. nontranslated regions of the vector, enhancers, promoters, and 3' 
untranslated regions, whicb interact with host cellular protems to carry out transcription and 
translation. Depending on the vector system and host utilized, any number of suitable transcription and 
translation elements, including constitutive and inducible promoters, maybe used. For example, when 
cloning in bacterial systems, inducible promoters such as flie hybrid lacZ promoter of flie Bluescript® 
phagemid (Stratagene, LaJolla CA) or pSportl (Gibco BRL) and ptrp-lac hybrids and tiie like maybe 
used. The baculovirus poljiiedron promoter may be used in insect cells. Promoters or enhancers 
derived from the genomes of plant ceUs (e.g., heat shock. RUBISCO; and storage protein genes) or 
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from plant virus (e.g. viral promoters or leader sequences) may be cloned into the vector. In 
mammalian cell systems promoters from the manamaUan genes or from mammalian viruses aie most 
appropriate. If it is necessary to generate a cell line that contains multiple copies of the sequence 
encoding the protein product of the gene of interest, vectors based on 5V40 or EBV maybe used with 
5 an appropriate selectable marker. 

In bacterial systems, a number of expression vectors may be selected depending upon the use 
intended for the protein of interest. For example, when large quantities of a protein are required for the 
production of antibodies, vectors which direct high level expression of fusion proteins that are readily 
purified may be desirable. Such vectors include, but are not limited to, the multifunctional E. coli 
10 cloning and expression vectors such as Bluescript® (Stratagene), in which Ihe sequence encoding the 
protein of interest may be ligated into the vector in frame with sequences encoding the amino-terminal 
Met and the subsequent 27 residues of b-galactosidase so that a hybrid protein is produced; pIN 
vectors (Van Heeke & Schuster, 1989, J Biol Chem 264:5503); and the like. Pgex vectors (Promega, 
Madison WI) may also be used to express foreign polypeptides as fusion proteins with GST. In 
15 general, such fusion proteins are soluble and can easily be purified from lysed cells by adsorption to 
glutathione-agarose beads followed by elution in the presence of free glutathione. Proteins made in 
such systems are designed to include heparmn, thrombin or factor XA protease cleavage sites so that ■ 
the. cloned polypeptide of interest can be released from the GST moiety at will. 

In the yeast, Saccharomyces cerevisiae, a number of vectors containing constitotive or 
20 inducible promoters sucb as alpha factor, alcohol oxidase and PGH may be used. For reviews, see 
Ausubelet al (supra) and Grant et al., 1987, Methods inEnzymology 153:516. 

In cases where plant expression vectors are used, the expression of a sequence encoding a 
protein of interest may be driven by any of a number of promoters. For example, viral promoters such 
as the 35S and 19S promoters of CaMV (Brisson et al., 1984, Nature 310:511) may be used alone or 
25 in combination with the omega leader sequence from TMV (Takamatsu et al., 1987, EMBO 1 6:307). 
Alternatively, plant promoters such as the small submiit of RUBISCO (Coruzzi et al., 1984, EMBO J 
3:1671; Broglie et al, 1984, Science, 224:838); or heat shock promoters (Winter I and Sinibaldi RM, 
1991, Results Probl Cell Differ., 17:85) maybe used. These constracts can be introduced into plant 
cells by direct DNA transformation or pathogen-mediated transection. For reviews of such techniques, 
30 see Hobbs S or Murry LE in McGraw HiU Yearbook of Science and Technology (1992) McGraw HlU 
New York NY, pp 191-196 or Weissbach and Weissbach (1988) Methods for Plant Molecular 
Biology? Academic Press, New York, pp 421-463. 

An alternative expression system which could be used to express a protein of interest is an 
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insect system. In one such system, Autographa caUfomica nuclear polyhedrosis virus (AcNPV) is 
used as a vector to express foreign genes in Spodoptera fi-ugiperda cells or ia Trichoplusia larvae. 
The sequence encoding ttie protein of interest maybe cloned into a nonessential region of flie virus, 
such as the polyhedrin gene, and placed under control of the polyhedrin promoter. Successful insertion 
5 of flie sequence encoding the protein of interest will render the polyhedron gene inactive and produce 
recombinant virus lacking coat protein coat. The recombinant viruses are then used to infect S. 
fi-igoerda cells or Trichoplusia larvae in which the protein of interest is expressed (Smith et al., 
1983., J Wol 46:584; Engelhard, et al., 1994, Proc Natl Acad Sci 91:3224). 

Ill mammalian host cells, a number of viral-based expression systems may be utilized. In cases 
10 where an adenovirus is used as an expression vector, a sequence encoding the protein of interest may 
be ligated into an adenovirus transcription/translation complex consisting of the late promoter and 
tripartite leader sequence. Insertion in a nonessential El or E3 region of the viral genome will result in 
a viable vims capable of expressing in infected host cells (Logan and Shehk, 1984, Proc Natl Acad 
Sci, 81:3655). In addition, transcription enhancers, such as the reus sarcoma virus (RSV) enhancer, 
15 may be used to increase expression in mammalian host ceBs. 

Specific initiation signals may also be required for efficient translation of a sequence encoding 
the protein of interest These signals include the ATG initiation codon and adjacent sequences. In 
cases where the sequence encoding the protein, its initiation codon and upstream sequences £ire 
inserted into the most appropriate expression vector, no additional translational control signals may be 
20 needed. However, in cases where only coding sequence, or a portion thereof, is inserted, exogenous 
transcriptional control signals including the ATG initiation codon must be provided. Furthermore, the 
initiation codon must be in the correct reading frame to ensure transcription of the entire insert. 
Exogenous transcriptional elements and initiation codons can be of various origins, both natural and 
synthetic. The efficiency of expression maybe enhanced by the inclusion of enhancers appropriate to 
25 the cell system in use (Scharf, et aL, 1994, Results Probl Cell Differ, 20:125; Bittner et aL, 1987, 
Methods in Enzymol, 153 :5 16). 

In addition, a host cell strain may be chosen for its ability to modulate the e^qiression of the 
inserted sequences or to process the expressed protein in the desired fashion. Such modifications of 
the polypeptide include but are not limited to, acetylation, carboxylation, glycosylation, phosphoiylation, 
30 lipidation and acylation. Post-translational processing which cleaves a "prepro" form of the protein 
may also be important for correct insertion, folding and/or function. Different host cells such as CHO, 
HeLa, MDCK, 293, W138, etc have specific cellular machinery and characteristic mechanisms for « 
such post-translational activities and may be chosen to ensure the correct modification and processing 
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of the introduced, foreign protein. 

For long-term, high-yield production of recombinant proteins, stable expression is preferred. 
For example, cell lines which stably express a foreign protein maybe transformed using expression 
vectors which contain viral origins of replication or endogenous expression elements and a selectable 
5 marker gene. Following the introduction of the vector, cells may be allowed to grow for 1-2 days in an 
enriched media before they are switched to selective media. The purpose of the selectable marker is 
to confer resistance to selection, and its presence allows growth and recovery of cells which 
successfully express the introduced sequences. Resistant clumps of stably transformed cells can be 
e3q>anded using tissue culture techniques appropriate to the cell type. 
10 Any number of selection systems may be used to recover transformed cell lines. These 

include, but are not limited to, the herpes simplex virus thymidrae kinase (Wigler., et al., 1977, Cell 
11:223) and adenine phosphoribosyltransf erase (Lowy, et al., 1980, Cell 22:817) genes which can be 
employed in tik- or aprt- cells, respectively. Also, antimetaboEte, antibiotic or herbicide resistance can 
be used as the basis for selection; for example, dhfr which confers resistance to methotrexate (Wigler 
15 et al., 1980, Proc Natl Acad Sci 77:3567); npt, which confers resistance to the aminoglycosides 

neomycin and G-418 (Colbere-Garapin et al., 1981., J MolBiol., 150:1) and als or pat, which confer 
resistance to chlorsulfuron and phosphinotricin acetyltransferase, respectively (Murry, supra). 
Additional selectable genes have been described, for example, trpB, which allows cells to utilize indole 
in place of tryptophan, or hisD, which allows cells to utilize histinol in place of histidine (Hartman and 
20 Mulligan, 1988, Proc Natl Acad Sci 85:8047). Recently, the use of visible markers has gained 

popularity with such markers as anthocyanins, B glucuronidase and its substrate, GUS, and hiciferase 
and its substrate, luciferin, being widely used not only to identify transformants, but also to quantify the 
amount of transient or stable protein expression attributable to a specific vector system (Rhodes et al., 
1995, Methods Mol Biol 55:121). 

25 

3. Identification of Transformants Containing the Polynucleotide Sequence 
Although the presence/absence of marker gene expression suggests that the gene of interest 
is also present, its presence and egression should be confirmed. For example, if the. sequence 
encoding a foreign protein is inserted within a marker gene sequence, recombinant cells containing the 
30 sequence encoding the foreign protein can be identified by the absence of marker gene function. 

Alternatively, a marker gene can be placed in tandem with the sequence encoding the foreign protein 
under the control of a single promoter, Esqpression of the marker gene in response to induction or 
selection usually indicates expression of the tandem sequences as well. 
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Alternatively, host cells wMch contain the coding sequence for a protein of interest and 
express the protein of interest may be identified by a variety of procedures known to those of slcill in 
the art. These procedures include, but are not limited to, DNA-DNA or DNA-RNA hybridization and 
protein bioassay or iimnunoassay techniques which include membrane, solution, or chip based 
technologies for tlie detection and/or quantification of the nucleic acid or protein. 

The presence of the polynucleotide sequence encoding the protem of interest can be detected 
by DNA-DNA or DNA-RNA hybridization or amplification using probes, portions or firagments of the 
sequeiice encoding the foreign protein of interest 

A variety of protocols for detecting and measuring the expression of the foreign protein, using 
either polyclonal or monoclonal antibodies specific for tiie protein are known in the art. Examples 
include enzymeTlinked mimunosorbant assay (ELISA), radioitnmunoassay (KLA) and fluorescent 
activated cell sorting (FACS). A two-site, monoclonal-based inomunoassay utilizing monoclonal 
antibodies reactive to two non-interfering epitopes on the protein of interest is preferred, but a 
competitive bindmg assay may be employed. These and other assays are described in Hampton et al., 
1990, Serological Meflhods a Laboratory Manual, APS Presds, St Paul MN and Maddox., et aL, 1983, 
J Exp Med 158:1211. 

4. Purificatibnof the Protein of Interest 

Host cells transformed with a nucleotide sequence encoding a protein of interest may be 
cultured under conditions suitable for the expression and recovery of the encoded protein from cefl 
culture. The protem produced by a recombinant cell may be secreted or contained intracellularly 
dependmg on the sequence and/or the vector used. As wifl be understood by fliose of skill in the art, 
expression vectors containing a sequence encoding a protein of mterest can be designed with signal 
sequences which direct secretion of tiie protein of interest flirough a prokaryotic or eucaryotic cell 
) membrane. Oflier recombinant constructions may join tiie sequence encodmg the protein of interest to 
the nucleotide sequence encoding a polj^eptide domain which wiU facflitate purification of soluble 
proteins (Kron et al., 1993, DNA Cell Biol, 12:441). 

.The protein of interest may also be expressed as a recombinant protein wifli one or more 
additional polypeptide domams added to facihtate protein purification. Such purification facilitating 
0 domains include, but are not limited to, metal chelating peptides such as a histidine-tryptophan modules 
that allow purification on immobilized metals, protein a domains fliat allow purification on immobilized 
immunoglobulin, and the domain utilized in flie FLAGS extension/affinity purification system (Immunex 
Corp, Seattle WA). The inclusion of a cleavable linker sequences such as Factor XA or enterokinase 
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(Invitrogen, San Diego CA), between the purification domain and the protein of interest is useful for 
facilitating purification. One such expression vector provides for expression of a fusion protein 
comprising the sequence encoding a foreign protein and nucleic acid sequence encoding 6 histidine 
residues followed by thioredoxin and an enterokinase cleavage site. The histidine residues facilitate 
5 purification while the enterokinase cleavage site provides a means for purifying the foreign protein 
from the fusion protein. 

In addition to recombinant production, frajgments of the protein of interest may be produced by 
direct peptide synthesis using solid-phase techniques (Stewart et al., 1969, Solid-Phase Peptide 
Synthesis, WH Freeman Co,. San Francisco; Merrifield, 1963, J Am Chem Soc, 85:2149). In vitro 
10 protein synthesis may be performed using manual techniques or by automation. Automated synthesis 
may be achieved, for example, using Applied Biosystems 431 A Peptide Synthesizer (Perkin Ekaer, 
Foster City CA) in accordance with the instructions provided by the manufacturer. Various fragments 
of a protein of interest may be chemically sjrnthesized separately and combined using chemical 
methods to produce the fiiU length molecule. 

15 

I. Preparation of Antibodies 

Antibodies specific for the protein products of the candidate genes of the invention are useful 
for protein purification, for the diagnosis and treatment of various diseases (e.g osteoarthritis) and for 
drug screening and drug design methods useful for identifying and developing compounds to be used in 

20 the treatment of various diseases (e.g. osteoarthritis). By antibody, we include constiuctions using the 
binding (variable) region of such an antibody, and other antibody modifications. Thus, an antibody 
useful in the invention may comprise a whole antibody, an antibody fragment, a polyfimctional antibody 
aggregate, or in general a substance comprising one or more specific binding sites from an antibody. 
The antibody fragment maybe a fragment such as an Fv, Fab or F(ab')2 fragment or a derivative 

25 thereof, such as a single chain Fv fragment. The antibody or antibody fragment may be non- 

recombinant, recombinant or humanized. The antibody may be of an immunoglobulin isotype, e.g., IgG, 
IgM, and so forth. In addition, an aggregate, polymer, derivative and conjugate of an immunoglobulin 
or a fragment thereof can be used where appropriate. Neutralizing antibodies are especially useful 
according to the invention for diagnostics, therapeutics and methods of drug screening and drug 

30 design. 

Although a protein product (or fragment or oligopeptide thereof) of a candidate gene of tiie 
invention that is useful for the production of antibodies does not require biological activity, it must be 
antigenic. Peptides used to induce specific antibodies may have an amino acid sequence consisting of 
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at least five amino acids and preferably at least 10 amino acids. Preferably, fiiey should be identical to 
a region of the natural protein and may contain fhe entire amino acid sequence of a small, naturally 
occurring molecule. Short stretches of amino acids corresponding to the protein product of a candidate 
gene of the invention may be fased with amino acids from another protein such as keyhole limpet 
5 hemocyanin or GST, and antibody wiU be produced against the chimeric molecule. Procedures well 
known in the art can be used for the production of antibodies to the proteux products of the candidate 
genes of the invention. 

For the production of antibodies, various hosts including goats, rabbits, rats, mice etc... maybe 
immunized by injection with the protein products (or any portion, fragment, or oligonucleotide thereof 

10 which retains immunogenic properties) of the candidate genes of the invention. Depending on flie host 
species, various adjuvants may be used to increase the immunological response. Such adjuvants 
include but are not limited to Freund's, mineral gels such as aluminum hydroxide, and surface active 
substances such as lysolecithin, pluronic polyols, polyanions, peptides, oil emulsions, keyhole limpet 
hemocyanin, and dinitrophenoL BCG (bacilli Cahnette-Guerin) and Corynebacterium parvum are 

15 potentially useful human adjuvants. 

1. Polyclonal antibodies. 

The antigen protein may be conjugated to a conventional carrier in order to increase its 
immunogenicity, and an antiserum to the peptide-carrier conjugate wiH be raised. Coupling of a peptide 

20 to a carrier protein and immunizations may be performed as described (Dymecki et al., 1992, J . BioL 
Oiem., 267 : 48 15). The serum can be ti.tered against protein antigen by ELISA (below) or 
alternatively by dot or spot blotting (Boersma and Van Leeuwen, 1994, J Neurosci. Methods, 51: 317). 
At the same time, the antiserum may be used in tissue sections prepared asdescribed. A useful serum 
will react strongly with the appropriate peptides by ELISA, for example, following the procedures of 

25 Green etaL, 1982, CeU, 28: 477. 

2. Monoclonal antibodies. 

Techniques for preparing monoclonal antibodies are weU known, and monoclonal antibodies 
30 may be prepared using a candidate antigen whose level is to be measured or which is to be either 
inactivated or affinity-purified, preferably bound to a carrier, as described by Amheiter et al., 1981, 
Nature, 294;278. 

Monoclonal antibodies are typically obtained firomhybridoma tissue cultures or from ascites 
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fluid obtained from auimals into wMdi the hybridoma tissue was introduced. 

Monoclonal antibody-producing hybridomas (or polyclonal sera) can be screened for antibody 
binding to the target protein. 

3. Antibody Detection Methods 

Particularly preferred immunological tests rely on the use of either monoclonal or polyclonal 
antibodies and include enzyme-linked immunoassays (ELIS A), immunoblotting and 
immunoprecipitation (s;ee VoUer, 1978, Diagnostic Horizons, 2:1, Microbiological Associates Quarterly 
Publication, Walkers^dUe, MD; Voller et al., 1978, J. Clin. PathoL, 31: 507; U.S. Reissue Pat. No. 
31,006; UK Patent 2,019,408; Butler, 1981, Methods Enzymol., 73: 482; Maggio, E. (ed.), 1980, 
Enzyme Immunoassay, CRC Press, Boca Raton, FL) or radiounmunoassays (RIA) (Weintraub, B., 
Principles of radioimmunoassays. Seventh Training Course on Radioligand Assay Techniques, The 
Endocrine Society, March 1986, pp. 1-5, 46-49 and 68-78). For analysing tissues for the presence or 
absence of a protein produced by a candidate gene according to the present invention, 
immunohistochemistry techniques may be used. It will be apparent to one sHUed in the art that the 
antibody molecule may have to be labelled to facilitate easy detection of a target protein. Techniques 
for labelling antibody molecules are well known to those skilled in the art (see Harlow iand Lane, 1989; 
Antibodies, Cold Spring Bferbor Laboratory). 

J. Preparation of a Labeled Protein 

1. Liabting of protein 

Labeling techniques are useful, according to the invention, for studying the biochemical 
properties, processing, intracellular transport, secretion and degradation of proteins. , 

Biosynthetic labeling of proteins produced by candidate genes of the invention is preferably 
performed with ^^S-methionine due to the high specific activity (>800Ci/mmol) and ease of detection 
of this amino acid. Another amino acid should be used to label a protein that contains little or no 
methionine. 

According to the following protocol, either suspension cells or adherent cells are labeled with 
^^S-methionine. Briefly, cells are washed and incubated for 15 min at 37°C in short-term labeling 
medium (complete serum-free, methionine freeRPMI or DMEM containing 5% (v/v) dialyzed fetal 
bovine serum) to deplete intracellular pools of methionine. Cells are then incubated in the presence of 
3^S-mefhionine working solution (0.1 to 0.2 mCi/ml in 37''C short-term labeling medium) such that 4nal 
of ^^S-methionine working solution is added per 2 x 10^ suspension ceBs and 2 to 4 ml of ^^S- 
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methionine worldng solution is added per 100 mni dish of adherent cells (0.5-2 x 10' cells), for a period 
of 30 min to 3 hour in a humidified, 37°C, 5 % CO^ incubator. Upon completion of labeling, suspension 
cells axe washed by centrifiigation in ice-cold PBS. Following removal of labeling medium, adherent 
cells are washed with PBS, scraped and collected by centrifugation. Labeled cells are processed and 

5 analyzed by immuno affinity chromatography, immunoprecipitation eind one- and two-dimensional gel 
electrophoresis (Ausubel et al., supra). 

If the protein of interest is synthesized at a relatively low rate or is in a steady state, it may be 
necessary to label cells for an extended period of time. When performing long-term biosynthetic 
labeling of cells, it is necessary to include unlabeled methionine in the medium to maintain cell viability 

10 and to ensure that mcorporation of label is maintained during the course of the experiment According 
to this method, cells can be labeled in the presence of ^^S-methioniae in long term labeling medium 
(90% methionine free RPMI or DMEM) for up to 16 hours (Ausubel et al., supra). 

2. /;i vifw TTranslation 

15 The protein product of the cloned candidate gene of the invention can be produced by the 

metiiods of in vitro transcription and in vitro translation. In vitro tianscription is performed essentially, 
as described in Section B entitied 'TProduction of a Polynucleotide Sequence" in the absence of a 
labeled ribonucleoside. The RNA produced by the in vitro transcription reaction wiU be extracted with 
phenol, ethanol precipitated twice and resuspended in 10ml of TE buffer. In vitro translation is 

20 performed by adding 1 to lOnd of RNA to an vitro translation kit (e.g. wheat germ or reticulocyte 
lysate) in the presence of 15mCi p^S]methionine, following the directions provided by the 
manufacturer. A typical reaction is carried out in a 30ml volume at room temperature for 30 to 60 
noinutes (Ausubel et al., supra). 

25 

K. Production of Cells Expressing a Nucleotide Sequence Comprising a Polymorphism 

Mammalian cells expressing a nucleotide sequence comprising a polymorphism are useful, 
according to the invention for determioing the biochemical and functional properties of the protein 
product of a nucleotide sequence comprising a polymorphism, for analyzing expression of a candidate 
30 gene, for large scale production of a protein of interest, for drug screening and for the production of 
■ transgenic animals or knockout mice. 

Methods of efficiently introducing foreign DNA into mammalian cells are known in liie art and 
include caldnm phosphate transfection, DEAE-dextran transfection, electeoporation and liposome- 
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mediated transfection (Ausubel et al., supra). 

Transfection Protocols 

1. Calcium-Phosphate Transfection 

5 The method of calcium phosphate transfection involves preparing a precipitate by slowly 

mixing a HEPES -buffered saliae solution witli a mixture of calcium chloride and DNA. According to 
this method, up to 10% of the cells on a dish will incorporate DNA. 

Cells to be transfected are split one day prior to transfection so that on the day of transfection 
ceDs are well-separated on the plate, a 10 cm dish of cells is fed with 9.0 ml of complete medium 

10 approximately 2 to 4 hours before the addition of the precipitate. DNA to be transfected (10-50mg/10- 
cm plate) is ethanol precipitated, resuspended in 450 ml sterile water and mixed with 50 ml of 2.5 M 
CaClj. The DNA/CaCl^ solution is added dropwise to a 15-ml conical tube containing 500 ml 2X 
HeBS (0.283M NaCl, 0.023M HEPES acid, 1.5 mM Na2HP04, pH 7.05). It is preferable to bubble 
the HeBS solution during the addition of the DNA mixture. After the precipitate has formed for 20 

15 minutes at room temperature, it is added evenly to the cells. The cells are incubated with flie 

precipitate at 37°C in a COahumidified incubator for 4-16 hours. Following removal of the precipitate, 
the ceDs are washed with PBS and fed in complete medium. Glycerol or dimethyl sulfoxide shock canf 
be used to increase the DNA uptake by certain types of cells (Ausubel et aL , supra). 

20 2. DEAE-Dextran Transfection 

Cells to be transfected are plated at a concentration such that after 3 days of growth they are 
30-50% confluent. The DNA to be transfected (approximately 4 mg) is ethanol precipitated, 
resuspended in AOmL TBS and added slowly while shaking to 80 ml of warm 10 mg/ml DEAE-dextran 
in TBS. After cells have been washed with PBS and fed with 4 ml of DMEM containing 10% Nu 

25 Serum/lOcm dish, the DEAE-dextraoIDNA mixture is evenly distributed over the entire plate. Cells 
are incubated with the DNA for approximately 4 hours in a humidified CO2 incubator. Following the 
removal of the DEAE-dextran/DNA mixture, cells are shocked by the addition of 5 ml of 10% DMSO 
in PBS. After a 1 minute incubation at room temperature, cells are washed with PBS and fed with 
complete medium (Ausubel et aL, supra). 

30 

3. Electroporation 

Alternatively, DNA can be introduced into cells by the use of high-voltage electric shocks, a 
technique termed electroporation. Briefly, according to the method of electroporation, cells are 

105 



BNSDOCID: <WO 030541 66A2J_> 



wo 03/054166 



PCT/US02/41225 



" suspended in an appropriate electxoporation buffer and placed in an electroporation cuvette. Following 
the addition of DNA, the cuvette is connected to a power supply and the cells are subjected to a high- 
voltage electrical pulse of a defined magnitude and length, optimized for the cell type being ' 
transfected. After a brief period of recovery, the cells are placed in normal culture medium. 

5 A population of cells to be transfected by electroporation is grown to late-log phase in 

complete medium. Typically stable transfection requires 5 X 106 cells, and transient transfection 
requires 1-4 X 10' cells. Cells are harvested by centrifugation for 5 minutes at 640 x g at 4°C. The 
resulting cell pellet is resuspended in half of the origLnal volume of ice-cold electroporation buffer (e.g. 
PBS without calcium or magnesium, Hepes buffered saline, tissue culture medium without serum, or 

10 phosphate buffered sucrose (272mM sucf ose/7 mM K2HPO4, pH 7.4/lmM MgCla)). The choice of 
an electroporation buffer is dictated by the cell line. Cells are then harvested by centrifugation for 5 
minutes at 640 x g at 4°C, and resuspended at 1 X lO'/ml in electroporation buffer at 0°C for stable 
transfection or at a higher concentration (up to 8 X lO'/ml) for transient transfection. Aliquots of the 
cells (0.5 ml) are transferred into the desired nuinber of electroporation cuvettes and placed on ice. 

15 DNA is added to the cell suspension in the cuvettes on ice. For stable transfection, DNA 

(optimally 1-10 mg) should be linearized with a restriction enzyme that cuts at a site in a non-essendal 
region, purified by phenol extraction and efhanol precipitated. Supercoiled DNA (optimally 10 mg) may 
be used for transient transfection. The DNA/cell suspension is mixed, and incubated on ice for 5 
minutes. 

20 The cuvette is placed in the holder in the electroporation appiauiatus (at room temperature) and 

shocked one or more times at the desired voltage and capacitance settings. An electroporation 
apparatus useful according to the invention is the Bio-Rad Gene Pulser. The number of shocks and the 
voltage and capacitance settings will vary depending on the cell type, and should be optmii2:ed. The 
two parameters that are critical for successful electroporation are the maximum voltage for the shock 

25 and the duration of the current pulse. 

FoUowing electroporation, the cuvette containing the mixture of cells and DNA is incubated on 
ice for 10 minutes. The transfected cells are diluted 20-fold in complete culture medium. For stable 
transfection cells are grown for 48 hours in nonselective medium and then transferred to antibiotic 
containing medium. For transient transfection, cells are incubated 50-60 hours and then harvested for 

30 the desired transient assay. 

L. Production of Animals Expressing a Nucleotide Sequence Comprising a 
Polymorphism 
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Transgenic animals expressing a construct comprising a candidate gene containiiig a 
polymorphism, according to the invention can be produced by methods well known in the art (reviewed 
in Reeves et aL, supra). Knock out mice wherein a candidate gene according to the invention has been 
disrupted can be produced by methods well known in the art (reviewed in Moreadith and Radford, 
5 1997, J,Mol. Med., 75:208 and Shastiy, 1998, Mol. Cell. Biochem., 181:163). These animals provide 
useful models for studying the functional consequences of one or more polymorphisms in a gene of 
interest. 

M. Production of a Candidate Gene Library 

10 The invention provides a method of producing a candidate gene library comprising genes that 

are potentially associated with the susceptibility to, or pathogenesis of a disease. A candidate gene 
library is useful for determining the genetic basis of a disease of interest. 

Genetic susceptibility to a disease must occur as a result of specific DNA differences relative 
to non-susceptible individuals. In the case of osteoarthritis, many genes are known which are 

15 potentially involved in the susceptibility to, or pathogenesis of ftie disease. These genes are included in 
the candidate gene library and the association of these genes with osteoarthritis is determined from 
population studies according to the invention. Unlike linkage studies wherein a region of the genome' 
that is. thought to be involved in a disease is determined, the candidate gene strategy, including 
association studies, addresses the involvement of a particular gene in a disease. The results of 

20 association studies of candidate genes are used to identify genes that should be intensively studied as 
potential therapeutics or therapeutic targets. 

According to the invention, the full range of poljrmorphic sites within each candidate gene is 
identified and examined in diseased and normal populations. The frequency of each gene variant 
(allele) in each population is then compared to the other. If a specific polymorphism under analysis 

25 contributes to the disease phenotype, it wiU be present in the diseased population at a higher frequency 
than in the normal population. In addition, if the specific polymorphism under analysis does not itself 
contribute to the disease phenotype but resides elsewhere in, or is near to a gene containing a 
contributory polymorphism, a significant association may be seen with the polymorphic marker being 
tested. This is because the two markers are in linkage disequilibrium with each other due to their close 

30 proximity. 

1. StrategiesforldentifyiiigGenes Associated with a Disease 

There are a number of methods known in the art for the identification of genes involved in a 
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disease. These methods include familial linkage studies followed by positional cloning, differential gene 
expression studies on tissues, and population-based candidate gene association studies. Although 
positional cloning has proven to be useful for diseases resulting from a single mutation, this technique is 
not suitable for identifying genetic linkage in diseases where multiple genetic variants combine to 

5 create disease susceptibility. Furfliermore, it has been demonstrated that the etiological basis of the 
majority of diseases comprises more than one gene. 

The goal of linkage studies is to determine the approximate position of disease genes by 
studying related individuals in families. According to linkage strategies, DNA markers that are 
randomly spaced throughout the genome, but are rarely located within genes, are tested for the 

10 frequency of their presence along with flie particular disease phenotype. There is approximately a 
50% chance of an unlinked gene and marker gene co-localizing. If a particular marker is present at a 
significantly higher frequency than expected in disease individuals, this indicates that the marker is 
located in the vicinity of the disease gene. Usually the disease gene is delimited to a large region 
. (containing tens to hundreds of genes). After a disease gene has been grossly noapped, this entire 

15 region must be extensively characterized to determine what genes are present in the region. Any gene 
that is identified according to this method becomes a candidate gene. 

Linkage studies have been used successfiiHy to identify the genes responsible for certain 
genetic, diseases originating from mutations in a single gene (monogenic diseases). However, most 
common human diseases are of polygenic origin wherein changes in multiple genes causes an 

20 increased susceptibility to or pathogenesis of a particular disease. Because the DNA changes 

associated with genes which contribute to polygenic diseases are cdmmon in the population, thereby 
diluting the contribution of a given region of the genome to the disease, it is difficult to perform linkage 
studies on diseases of polygenic origin. 

25 Tjnkage analysis 

A series of genetic crosses is performed in an animal model system of a particular defect that 
is characteristic of a disease of interest (e.g. osteoarthritis) between individuals having an observable 
mutant phenotype and normal individuals of a conttol strain. At least one disease- related loci is used 
as a marker in fliese crosses. Alternatively, linkage analysis ban be performed using chromosomal 
30 markers that do not comprise a disease related locus (described below). If non-random assortment of 
the mutant trait with a marker locus is observed, and if that non-random assortment is statistically 
significant (for example, if a Student's t test or ANOVA is applied to the results) the trait is linked to 
the marker locus. 
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Similarly, linkage analysis using an existing human or other mammalian pedigree may be 
performed. Pedigree analysis is a useful technique for identifying genes for which variant alleles may 
contribute to the risk, onset or progression of a disease in a family containing multiple individuals 
afflicted with a disease; according to this method, numerous genetic loci from affected and unaffected 
5 family members are compared. Non-random assortment of a given genetic marker between affected 
and unaffected family members relative to the distributions observed for other genetic loci indicates 
that the marker (for example, a variant isoform of a gene) either contributes to the disease or is in 
physical proximity to another that does so. 

If a non-random assortment of the disease-related phenot3^e with a marker locus is observed, 
10 using either approach, this is indicative of an association between the gene underlying the defect and 
that locus. Because the strength of any conclusion drawn from linkage analysis is statistically-based, 
the accuracy of the results is thought to be proportional to the number of crosses or family members 
and genetic loci analyzed. 

15 Positiona l Cloninp; 

If linkage is confirmed it is preferable to perform a molecular analysis of the region in which . 
the peak of linkage maps. The wide availability of yeast artificial chromosome (YAC) or bacterial 
artificial chromosome (BAG) libraries facilitates this analysis, a nucleic acid sequence specific for a 
region encompassing a gene which is determined to occupy a map location of a particular locus of 

20 interest is examined, and open reading frames are evaluated to determine their relationship with the 
observed phenotype. An initial evaluation may be performed with the assistance of a computer . 
program, such as the PathCaDing™ (CuraGen) biological pathway discovery platform. All or a subset 
of the open reading frames present in the region are then cloned (e.g., by PGR) from mutant animals 
or affected family members and from their healthy counterparts (either control animals or unaffected 

25 family members), and the sequences of these open reading frames are compared. If a mutation or 
other allelic variant is found to be linked to individuals displaying the disease phenotype (in a 
statisticaUy-sigmficant, non-random manner), it can be concluded that this mutation is associated with a 
disease phenotype. A nucleic acid fragment containing this gene can be labeled and used as a probe 
for in situ hybridization analysis of fixed chromosomes of the human or other mammal to determine 

30 precisely the physical location of the gene. Furthermore, a gene that has been mapped and isolated in 
this manner maybe useful as a candidate target for disease diagnosis and for drug targeting according 
to the invention (see below). 
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2. Identification of Genes to be Included in Candidate Gene Library 

A candidate gene library according to the invention will include i. genes that are involved in 
known or predicted disease pathways, ii. new genes that are identified by a relevant pattern of specific 
tissue or ceU expression, iii. genes that map to genomic regions of known linkage, and iv. gene 
sequences (from sequence databases) that arehomologs of the above referenced categories of 
potential candidate genes. The choice of potentially related genes to be selected from a database wiU 
depend on the percent identity as calculated by Fast DB and based upon mismatch penalty, gap 
penalty, gap size penalty and' joining penally. Rgure 1 summarized 

Based on the physiological changes associated with a disease of interest, predictions can be 
made regarding a cell or tissue-type that would be expected to express high or low levels of candidate 
genes associated with a particular disease. For osteoarthritis, it is expected that muscle, adipose, 
pancreas or liver tissue or tissue comprising insulin secreting pancreatic b-ceUs, would be useful for 
identifying candidate genes according to the invMrtion. 

Differences in the expression of known and unknown genes in normal and disease tissue can 
be determined by methods known in the art incfcding Serial Analysis of Gene Ejcpression (SAGE) 
(Velcuescu et al, 1995, Science, 270:484), subtractive hybridization/screening (described below), 
differential display (Ling and Pardee, 1992, Science, 257:967) high-density microarray expression 



The technique of SAGE allows for the rapid, detailed analysis of thousands of transcripts. 
SAGE depends on the following two principles. First, sufficient information is contained within a short 
nucleotide sequence (approximately 9-lObp), isolated flx)m a defined location within a tiranscript, to 
uniquely identify a transcript. Second, flie concatenation of short tags of sequence allows tiranscripts to 
be analyzed serially by sequencing multiple tags within a single clone. 

The method of SAGE is performed by synthesizing double-stranded cDNA firom mRNA, 
cleaving the resulting cDNA with an anchoring restriction endonuclease that is expected to cleave 
most transcripts at least one time, and isolating flie most 3' region of the cleaved cDNA by binding to 
streptavadin beads. This protocol allows for the identification of a unique site on a transcript that 
corresponds to the restriction site located closest to the polyA taiL Replicate samples of the most 3' 
region of flie cDNA are ligated to one of two linker molecules that contain a type IIS restiiction site 
for a tagging enzyine. The cleavage site for Type US restrictioii endonucleases is located at a defined 
distance up to 20 bp from the asymmetiic recognition site. linkers are designed such that upon 
cleavage of the ligation product with the tag^ enzyme tihere is release of the linker and an attached 
short region of cDNA. 
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Following tlie creation of blunt ends, tlie two pools of released tags are ligated to each other 
and the resulting ligated product is used as a template for PGR amplification in the presence of 
primers that are specific for each linker. The PGR product is cleaved with the anchoring enzjmae and 
amplification products, comprising two tags linked tail to tail, are isolated, concatenated loy ligation, 

5 cloned and sequenced (Velescu et al., supra). 

Differential display provides a method for separating and cloning individual mRNAs by PGR 
analysis. According to the method of differential display, oligonucleotide primers are selected wherein 
one primer is anchored to the polyadenylate tail of a subset of mRNA species and the other primer is 
short and of an arbitrary sequence such that it anneals at different positions relative to the first primer. 

10 The mRNA subpopulations that are identified with these primer parrs are subjected to reverse 

transcription, amplified and analyzed on a DNA sequencing gel. By using multiple sets of primers, a 
reproducible pattern of amplified cDNA fragments that demonstrate a requirement for the sequence 
specificity of either primer can be obtained (Liang and Pardee, supra). 

According to the method of high-density microarray expression testing, DNA sequences to be 

15 tested for expression are spotted onto a surface, usually at high-density to allow for flie testing of 

many genes. The surface contain the DNA sequences is typically referred to as a 'chip'. The spotted , 
DNA cam be either cDNA clones or oligonucleotides. RNA is prepared from the two cells or tissues , 
to be compared. The RNA firom one cell/tissue will be labeled red and the RNA fi-om the other 
cell/tissue will be labeled yellow. Both RNA preparations are hybridized to the DNA array. The ratio : 

20 of red to yellow is indicative of the relative levels of expression between the two cells/tissues. 

3. Mapping a candidate gene 

Molecular and cytogenetic methods of mapping candidate genes are known in the art and are 
summarized below ; Linkage analysis provides a method for identifying genes mappkig to genomic 
25 regions of known liokage. 

Unkage analysis 

As described above, linkage analysis may be performed between an unmapped candidate 
gene and one or more of the disease-related loci or by analyzing the genetic linkage between the 
30 candidate gene and chromosomal markers which are not themselves linked to a disease-related locus, 
according to the same method. For the latter type of analysis it is preferable that the spacing of 
markers throii^out the genome of the test organism is approximately one every cM or less. This 
spacing will ensure complete coverage of the genome and will facilitate accurate mapping. 

Ill . 
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Other methods for mapping a candidate gene are provided below. 
Syntenic similaritv 

As a result of classical genetic studies and, more recently, multi-laboratory genomic 
5 sequencing collaborations sucb as the Human Genome Project and Mouse Genome Project, the 
human and mouse genomes have been extensively characterized. It is now loiown that there is a 
significant degree of co-linearity among hunian, mice and rats wherein there is conservation relative to 
one another among these several species in the chromosomal map positions of numerous genes and 
groups of genes. Examination of the hunian and/or mouse chromosomal maps in the re^ons 
10 comparable to those to which a particular loci of interest maps in the rat wiH yield candidate genes 
which may be responsible for the physiological changes associated with a disease of interest. The 
methods of radiation hybrid mapping or fluorescence in situ hybridization at low stringency to rat 
chromosomes using labeled fragments derived from fhe human or mouse genes can be used to 
confirm that genes present in these regions of the human and/or mouse are present in the regions of 
15 interest in the rat. 

Radiation hybrid CRH) mapping is a somatic cell hybrid technique that was developed to 
create high resolution, contiguous maps of mammalian chromosomes. The method is useful for , 
ordering DNA markers spaiming millions of base pairs of DNA at a resolution not easily obtained by 
other mapping methods (Cox et al., 1990, Science, 250: 245; Burtaeister et al, 1991, Genomics, 9:19; 
20 Warrington et aL, 1992, Genomics, 13: 803; Abel et al., 1993, Genomics, 17:632). Radiation hybrid 
mapping facilitates the mapping of non-polymorphic DNA markers that cannot be used for meiotic 
mapping. 

According to the method of radiation hybrid mapping a lethal dose of X-irradiation is used to 
fragment the chromosomes of the donor cell line. Chromosome fragments from the donor cell line are 

25 then retained, in a non-selective manner, following cell fusion with a recipient cell line. The resulting 
hybrid clones are then analyzed for the presence or absence of specific donor chromosome markers. 
It is expected that markers that are farther apart on a chromosome are more likely to be broken apart 
by radiation and to segregate, independently in the RH ceDs than markers that are closer together. By 
performing a statistical analysis of the co-segregation of various loci in hybrid clones, it is possible to 

30 construct a map that provides information regarding the relative order and distance of markers (Cox et 
al., 1990, supra; Warrington et al., 1991, Genomics, 11: 701; Ceccherini et aL, 1992, Proc. Natl. Acad. 
ScL USA, 89: 104). 
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Subtractive screeni ng 

In -view of the observation that only a subset of an organism's geiies are expressed in a given 
tissue, there is a high probability that transcripts which differ in ejqiression between cells of the same 
tissue in a mutant and control animal are responsible for the observed mutant phenotype. 

5 According to the method of subtractive cloning, mRNA is isolated from a tissue of choice, 

wherein the tissue is obtained from two distinct organisms and wherein one organism displays a 
mutant phenotype with regard to a particular trait while the other is normal in that respect. Methods 
well known in the art are used to prepare cDNA from the mRNA derived from the organism. The 
niEiNA template is flien degraded, either by hydrolysis under alkaline conditions or by RNAase H- 

10 mediated cleavage, and the cDNA is returned to a buffer in which mRNA is stable, and mixed with a 
molar excess of mRNA prepared from the second organism under conditions of stringent 
hybridization. The mixture is then passed over a hydroxyapatite column, which binds double-stranded 
nucleic acids but allows single stranded nucleic acid molecules to pass through. Reverse transcripts 
derived from the first sample which do not hybridize to niRNA molecules derived from the second 

15 organism (in other words, reverse franscripts specific to the first tissue sample) are present in the 
flow-through fraction and are cloned into a vector to create a subtraction library. The reciprocal 
experiment (in which tiie cDNA is derived from the second mRNA preparation) is also carried out to 
create a complete set of transcripts specific to the tissue samples derived from the two organisms. 
This procedure will provide transcripts that can be labeled and used as probes in in situ 

20 hybridization analysis of immobilized chromosomes. The method of subtractive screening therefore, 
yields both cloned genes as well as reagents useful for deternixDing if the cloned genes co-localize with 
a loci of interest. If a particidar gene is found to co-localize to a loci of interest, the genes may be 
analyzed functionally (e.g. , in a phenotypic rescue experiment, as described below or by the 
phenotypic assays described in Section F entitled "Identification and Characterization of 

25 Polymorphisms") Ultimately, these genes may be used as targets for drugs or disease diagnostic 
methods, or even as therapeutic nucleic acids. 

. Mutagenic transposon mapping 

The selection of insertional events that lie within genes (e.g., within coding or regulatory 
30 sequences) is facilitated by the use of entrapment vectors, first described in bacteria (Casadaban and 
Cohen, 1979, Proc. NatL Acad. Sci. U.S.A., 76: 4530; Casadaban et al., 1980, J Bacteriol, 143: 971). 
By employing animal models, entrapment vectors can be introduced into pluripotent ES cells in culture 
(for example, using electroporation or a retrovirus) and then passed into the germline via chimeras 
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(Gossler et al., 1989, Science, 244: 463; Skames, 1990, Biotechnology, 8:827). Alternatively, transgenic 
animals containing entrapment vectors may be generated by standard oocyte injection protocols. 

These methods result in DNA integrations that are highly mutagenic because fhey interrupt 
the endogenous coding sequence. It is estimated that the frequency of obtaining a mutation in some 

5 gene of any in the genome using a promoter or gene trap is about 45%. For adetailed description of 
retroviral insertion mutagenesis see Methods EnzymoL, vol. 225, 1990. Genes which are expressed in 
a tissue of interest and for which a biochemical assay of a particular activity have been developed in 
animal models are most useful according to this method. Promoter or gene trap vectors often contain a 
reporter gene, e.g., lacZ, Cat or gieen fluorescent protein (GJp) that lacks its own upstream 

10 promoter and/or splice acceptor sequence. That is, promoter gene traps contain a reporter gene with a 
splice site but no promoter. If the vector integrates within a gene and is spliced into the gene product, 
then the reporter gene will be expressed. Enhancer traps contain a reporter gene and have a minimal 
promoter which requires the activity of an enhancer in order to function. If the vector integrates near 
an enhancer (whether in a gene or not), then the reporter gene wiU be expressed. Activation of flie 

15 reporter gene can only occur when the vector is integrated within an active host gene and generates a 
fusion transcript with the host gene. The activity of a reporter gene provides an easy assay for 
determining if a vector has been integrated into an expressed gene. Methods for detecting reporter 
gene activity in transfected cells or tissues of a transgenic animal are well known in the art. 

The mntagenic vector may be mapped using standard cytogenetic techniques, such as in situ 

20 hybridization, wherein a labeled fragment comprising vector-specific sequence is used as a probe. Co- 
localization of the probe with a particular locus of interest indicates that the associated gene is a 
suitable candidate and should be subjected to further analysis. A gene that has been identified in this 
manner can. be cloned as described. 

25 N. Diagnostic Indicators, Screens and Disease Symptoms 

In another embodiment of the invention, there is provided a method of (^agnosing or 
determining susceptibility of a subject to joint space narrowing and/or osteophyte development 
and/or joint pain. This method involves analyzing the genetic material of a subject to determine 
which allele(s) of a gene is/are present. The method may include determining whettier one or more 
30 particular alleles are present, or which cornbination of alleles (i.e. a haplotype) is present. The 
method may also include determining whether subjects are homozygous or heterozygous for a 
particular allele or haplotype. 

In a preferred embodiment, the method conprises determining which allele of one or more 
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polyiDDrphisms of the inveatioii is/are. present. In particular, the method may include determimng the 
presence of a polymorphism of a gene which in coiribination with polymorphisms defined herein or 
other polymorphisms may define a risk haplotype. The polynucleotides sequences for these 
particular alleles may be used for diagnostic puiposes. The polynucleotides which may he used 

5 include oligonucleotides, complementary RNA and DNA molectiles and PNAs. The 

polynucleotides may he used to determine whether subjects are homozygous or heterozygous for a 
particular allele or haplotype making them susceptible to joint space narrowing and/or osteophyte 
development and/or joint pain, andhencej osteoarthritis. 

hi one aspect, hybridization with a PGR probe which is capable of detecting a particular 

10 polymorphism may be used to identify nucleic acid sequences of particular alleles or haplotype. 
These probes must be specific to these particular alleles and the stringency of the hybridization or 
anplification must be such that the probe identifies only this particular allele. 

Means for producing specific hybridization probes for these polynucleotides of particular 
alleles include the. cloning of these polynucleotide sequences into vectors for the production of 

15 mRNA probes is well known to one skilled in the art. Such vectors are known in the art, are 
commercially available, and may be used to synthesize RNA probes in vitro by means of the 
addition of the appropriate RNA polymerases and the appropriate labeled nucleotides. 
Hybridization probes may be labeled by a variety of reporter groups, for example, by radionuclides 
such as ^^P or ^^S, or by enzymatic labels, such as alkaline phosphatase coupled to the probe via 

20 avidin/biotin coupling systems, and the like. 

Polynucleotides of particular alleles or haplotype may be used in Southem or northern 
analysis, dot blot, or other membrane-based technologies; in PGR technologies; in dipstick, pin, and 
raultiformat ELISA-like assays; and in microarrays utilizing fluids or tissues fix>m patients to detect 
susceptibility to joint space narrowing and/or osteophyte development and/or joint pain. Such 

25 qualitative methods are well known in the art.. 

In a particular embodipient, polynucleotides of particular alleles or haplotype may be used in • 
assays that detect susceptibility to joint space narrowing and/or osteophyte development and/or 
joint pain, particularly those mentioned above. Polynucleotides complementary to sequences of a 
particular allele or haplotype may be labeled by standard methods and added to a fluid or tissue 

30 sample from a patient ttadear conditions suitable for the formation of hybridization coirplexes. After 
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a suitable incubation period, the sample is washed and it is deteramed if there is a signal. If a signal 
is found, then the presence of the polynucleotide of a particular allele, alleles or haplotype in the 
sample iudicates tiie susceptibility to joiat space narrowing and/or osteophyte development and/or 
joint pain, and hence, osteoaithritis. Such assays may also be used to determine the particular 

5 therapeutic treatment regimen for an individual patient. 

With respect to osteoarthritis, the presence of a particular polymorphism or polymorphisms 
in a tissue sample from an individual may indicate a predisposition for joint space narrowing and/or 
. osteophyte development and/or joint pain, or may provide a means for detecting osteoarthritis prior 
to the appearance of actual clinical symptoms. A more definitive diagnosis of this type may allow 

10 health professionals to employ preventative-measures or aggiessive treatment earlier, thereby 
preventing the development or further progression of osteoarthritis. 

Additional diagnostic uses for oligonucleotides designed from the polynucleotide sequences 
of a particular allele or haplotype may involve the use of PGR. These oligomers may be chemically 
synthesized, generated enzymaticaUy, or produced in vitro. Oligomers will contain a fragment of a 

15 polynucleotide a particular allele, alleles or haplotype or a fragment of a polynucleotide 

complementary to the polynucleotide a particular allele, alleles or haplotype, and wfll be employed 
under optimized conditions for identification of a specific polymorphism, polymorphisms or 
haplotype. OBgomers may also be en^loyed under very stringent conditions for detection of these 
particidar DNA or RNA sequences. 

20 In ftuther embodiments, oligonucleotides or longer fragrnents derived from any of the 

polynucleotides described herein may be used as elements on a microarray. The microarray can be 
used in transcript imaging techniques to detect a particular polymorphism, polymorphisms or 
haplotype simultaneously as described below. In particular, this information may be used to develop 
a pharmacogenomic profile of a patient in order to select the most appropriate and effective 

25 treatment regimen for that patient. For exanq)le, therapeutic agents which are highly effective and 
display the fewest side eJBFects may be selected for a patient based on his/her pharmacogenomic 
profile. 

Micrdarrays may be prepared, used, and analyzed using methods known in the art 
(Brennan, T.M. et al. (1995) U.S. Patent No. 5,474,796; Schena, M. et al. (1996) Proc. NatL 
30 Acad. Sci. USA 93: 10614-10619; Baldeschweiler et al. (1995) PCT appKcation W095/25 1 1 16; 
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Shalon, D. et al. (1995) PCT application WO95/35505; HeUer, R.A. et al. (1997) Proc. Natl 
Acad. Sd. USA 94:2150-2155; HeUer, M:J. et al. (1997) U.S. Patent No. 5.605,662). Various 
types of microarrays are weU known and thorougMy described ia Schena, M., ed. (1999; DNA 
MicroaiTavs: A Practical Approach , Oxford University Press, London). 

5 In another embodiment, a method iavolves the use of antibodies ia diagnosing or detenninirig 

the susceptibility to joint space narrowing and/or osteophyte development and/or joint pain. The 
antibodies would specifically brad to an epitope of a particular allele or form of the protein and may 
be used to determine susceptibility to joiat space narrowing and/or osteophyte development and/or 
joint pain, and hence, osteoarthritis. Antibodies useful for diagnostic purposes may be prepared in 

10 the same manner as described above. Diagnostic assays for determiniiig susceptibility to joiat space 
narrowing and/or osteophyte development and/or joint paia include methods which utilize the 
antibody and a label to detect a particular allele or form of the protein in human body fluids or in 
extracts of cells or tissues. The antibodies may be used with or without modification, and may be 
labeled by covalent or non-covalent attachment of a reporter molecule. A wide variety of reporter 

15 molecules are known in the art and may be used. 

A variety of protocols for measuring a particular allele or form of the proteia, including 
ELISAs, RIAs, and FACS, are known in the art and provide a basis for diagnosiog susceptibility to 
joint space narrowing and/or osteophyte development and/or joint pain. 

20 O. Preparation of a Human Sample 

The presence of an allelic form of a gene containiag a sequence variation, according to the 
invention, can be detected by testing any tissue of a human subject. Human samples that are useful 
according to the iuvention include tissue or fluid samples containing a polynucleotide or polypeptide of 
interest, include but sure not limited to plasma, serum, spinal fluid, lymph fluid, urine, stool, external 
25 secretions of the skin, respiratory, intestinal and genitoruinary tracts, saliva, blood ceDs, tumors, organs, 
tissue and samples of in vitro cell culture constituents. Genomic DNA, cDNA orRNA can be 
prepared from the human sample according to the methods described above. 

P. Methods of Use 
30 1. Nucleic Acid Diagnosis and Diagnostic Kits 

In order to detect the presence of an allele of a gene predisposing an individual to 
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osteoarthritis, a biological sample such as blood is prepared and analyzed for the presence or absence 
of susceptibility alleles of a gene containing a polymorphism, according to the invention. Results of. 
these tests and interpretive information will be returned to the health care provider for communication 
to the tested individual. Such diagnoses inay be performed by diagnostic laboratories, or, alternatively, 
5 diagnostic kits are manufactured and sold to health care providers or to private individuals for self- 
diagnosis. 

Initially, the screening method will involve amplification of the relevant gene sequences. In 
another preferred embodiment of the invention, the screening method involves a non-PCR based 
strategy. Such non-PCR based screening methods include Southern blot analysis to detect the 

10 presence of a variant form of a gene in a sample comprising total genomic DNA jfrom the individual 
being tested. Alternatively, northern blot analysis can be used to detect an aberrant mRNA encoded 
by a gene, that exhibits altered stability or is the result of alternative splicing in a sample comprising . 
RNA firom an individual being tested. The methods of S 1 nuclease analysis, RNase protection and 
primer extension can also be used to determine both the endpoint and the amount of a gene specific 

15 mRNA (Ausubel et al., supra). Both PGR and non-PCR based screening strategies can detect target 
sequences with a high level of sensitivity. 

The preferred method, according to the invention, is target amplification. According to this 
method, the target nucleic acid sequence is amplified with polymerases. One particularly preferred 
method using polymerase-driven amplification is PGR (described above). The polymerase chain 

20 reaction and other polymerase-driven amplification assays can achieve over a million-fold increase in 
copy number through the use of polymerase-driven amplification cycles. PGR primers useful for target 
amplification according to the invention, will be designed to amplify a region of DNA containing one or 
more polymorphisms. Allele specific primers (comprising one or more polymorphisms) are also useful 
for detecting gene sequence variations by PGR methodologies according to the invention. The absence 

25 of a particular polymorphism wUl be indicated by the absence of an amplified product when the 

amplification step is carried out in Ihe presence of allele specific primers. Once amplified, the resulting 
nucleic acid can be sequenced and the specific sequence of the test DNA will be compajied with the 
wild type sequence by usiag the computer programs described in Section F entitled 'Identification and 
Characterization of Polymorphisms". Altemath'ely, the amplified product will be analyzed by Southern 

30 blot assay with nucleic acid probes. Nucleic acid probes, useful according to the invention, will be 
specifically hybridizable to a mutant form of a gene but not to the wild type gene due to the presence 
of one or more polymorphisms. 

When a probe comprising flie target sequence, according to the invention, is used to detect the 
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presence of tlie target sequences via non PCR-based strategies, (for example, in screening for 
osteoarthritis susceptibility), the biological sample to be analyzed, such as blood or senam, may be 
treated, if desired, to extract the nucleic acids (as described above). The sample nucleic acids (isolated 
from a biological' sample or amplified by PGR) may be prepared 'm various ways to facilitate detection 

5 of the target sequence; e.g. denaturation, restriction digestion, electrophoresis or dot blotting. 

Preferably, the targeted region of the nucleic acids being £inalyzed are at least partially single-stranded 
to form hybrids with the targeting sequence of the probe. If the sequence is naturally single-stranded, 
denaturation will not be required. However, if the sequence is double-stranded, the sequence will 
probably need to be denatured. Denaturation can be carried out by various techniques known in the 

10 art. 

To detect the presence of a sequence variation in a gene, according to the invention, analyte 
nucleic acid and probe wiUbe incubated under conditions which promote stable hybrid formation of the 
target sequence in tiie probe with the putative targeted sequence in the sample DNA. If the region of 
the probe which is used to bind to the anal5?te is designed to be completely complementary to the 

15 targeted region, high stringency conditions are desirable in order to prevent false positives. However, 
conditions of high stringency will be used only if the probes are complementary to regions of the 
chromosome which are unique, in the genome. The stringency of hybridization is determined by a 
number of factors (described above). Detection, if any, of the resulting hybrid is usually accomplished 
by the use of labeled probes. Alternatively, the probe may be unlabeled, but may be detectable by 

20 specific binding with a ligand which is labeled, either dkectly or indirectly. Suitable labels, and methods 
for labeling probes and ligand are known in the art, and are described in Section C entitied 'Troduction 
of a Nucleic Acid Probe". 

• Accordingly, the foregoing screening method may be modified to identify individuals having a 
gene containing a neutral polymorphism not associated with osteoarthritis, by preferably amplifying 

25 DNA fi-agments of a gene derived from a particular individual. The amphfied DNA fi-agments are 
sequenced and the sequence is compared to the coiisensus gene sequence containing neutral 
polymorphisms. At this time, differences between lh& individual's coding sequence for a gene and a 
consensus sequence for the same gene are determined wherein the presence of any neutral 
polymorphisms and the absence of a polymorphisms not previously identified as neutral polymorphisms 

30 can be correlated with an absence of increased genetic susceptibility to osteoarthritis resulting from a 
mutation in a gene coding sequence. 

In another embodiment of the invention, detection of a polymorphism wiU be performed by 
detecting loss of a restriction enzyme recognition site due to the presence of one or more 
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polymorphisms. According to fliis embodiment, a polymorphism will be detected with a polynucleotide 
probe that is capable of detecting a restriction enzyme fragment containing the polymorphism, wherein 
the fragment is of a size that can be easily separated on an agarose gel and visusilized by Southern blot 
analysis. A polynucleotide probe according to this embodiment of the invention can be specific for a 

5 sequence within the candidate gene or outside of the candidate gene. 

It is also contemplated within the scope of this invention that the nucleic acid probe assays of 
this invention will employ a mixture of nucleic acid probes capable of detecting a gene. Thus, in one 
example to detect the presence of a gene in a test sample, raore than one probe complernentary to a 
gene is employed and in particular the number of different probes is alternatively 2, 3, or 5 different 

10 nucleic acid probe sequences. In another example, to detect the presence of mutations in the gene 
sequence in a patient, more than one probe complementary to a gene is employed wherein the probe 
mixture includes probes capable of binding to the allele- specific mutations identified in populations of 
patients with alterations in a gene. In this embodiment, any number of probes can be used, and will 
preferably include probes corresponding to the major gene mutations identified as predisposing an 

IS individual to osteoarthritis. 

Northern blot analysis, SI nuclease analysis, RNase protection and primer extension (Ausubel 
et aL, supra) are also methods according to the invention for detecting changes in mRNA resulting 
from the presence of one or more polymorphisms in the sequence of a gene. 

Additionally, of flie methods of genotyping described in Section F entitled 'Identification and 

20 Characteiizatiou of Polymorphisms" can be used for diagnostics according to the invention. 

2. Peptide Diagnosis and Diagnostic Kits 

Osteoarfhiitis can also be detected on the basis of an alteration of the wild-type polypeptide. 
Such alterations can be determined by sequence analysis in accordance with conventional techniques. 

25 More preferably, antibodies (polyclonal or monoclonal) are used to detect differences in, or the 
absence of peptides derived from a gene of interest. The antibodies maybe prepared as described 
above in Section I entitled 'Treparation of Antibodies". Preferably, antibodies will immonoprecipitate 
the protein product of a gene from solution as well as react with the protein product of a gene on 
Western or immunoblots of polyacrylamide jpels. Antibodies usefiil according to the invention will also 

30 detect the protein product of a gene in parai&t or frozen tissue sections, using immunocytochemical 
techniques. 

Preferred embodiments relatiog to methods for detecting wild l^^pe or mutant forms of the 
protein product of a gene include enzyme linked immunosorbent assays (ELISA), radioinomtuioassay 
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(RIA), iramunoradiometric assays (IRMA) and iramunoenzymatic assays (lEMA), including sandwich 
assays using monoclonal and/or polyclonal antibodies. Exemplary sandwich assays are described by 
David et al. In U.S. Pat. Nos. 4,376,1 10 and 4,486,530, hereby incorporated by reference. 

5 3. Drug Screeiiing 

This invention is particularly useful for screening therapeutic compounds by using the mutant 
gene or protein product or binding fragment of the gene in any of a variety of drug screening 
techniques. 

The protein product or fragment of a gene employed in such a test may either be free in 

10 solution, affixed to a solid support, e3q>ressed on the surface of a cell, or located intracelhilarly. One 
method of drug screening utilizes eukaryotic or procaryotic host cells which are stably transformed 
with a recombinant polynucleotide expressing tiie polypeptide or fragment, preferably in competitive 
binding assays. Such cells, either in viable or fixed form, can be used for standard binding assays. In 
particular, these cells can be used to measure formation of a complex comprising the protein product 

15 or fragment of a gene and the agent being tested. Alternatively, these ceBs can be used to determine if 
the formation of a complex between the protein product or fragment of a gene and a known ligand is 
interfered with by an agent being tested. 

Thus, the present invention discloses methods useful for drug screening wherein such methods 
comprise contacting a candidate drug with a polypeptide or fragment derived from a gene and 

20 assa^dng (i) for the presence of a conq)lex between the drag and the polypeptide derived or fragment 
derived from a gene, or (ii) for the presence of a complex between the polypeptide or fragment 
derived from a gene and a ligand, by methods weU known in the art Preferably, the polypeptide or 
fragment derived from a gene is labeled for use in competitive binding assays. Methods for producing 
a labeled protein by in vitro tianslation are described in Section J entitled 'Treparation of a Labeled 

25 Protein". Free polypeptide or fragment will be separated from that present in a protein:protein 

complex, and the amount of free (i.e., uncomplexed) label wiUbe used as a measure of the binding of 
the test drug to the polypeptide or the ability of the test drug to interfere with protein:ligand binding. 

Another method of drag screening allows for high throughput screening for compounds 
exhibiting suitable binding affinity to the polypeptides and is described in detail in Geysen, WO 

30 84/03564. According to this raethod, large numbers of different small peptide test compounds are 
synthesized on a solid substrate, such as plastic pins or another suitable surface. The peptide, 
test compounds are reacted with the polypeptides or peptide fragments derived from a gene, and 
washed. Bound polypeptide is then detected by methods well known in the £irt. 
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Purified protein can be coated directly onto plates for use in the aforementioned drug 
screening techniques. Alternatively, non-neutralizing antibodies to the polypeptide can be used to 
capture the polypeptide or peptide fragment of interest and immobilize it on the solid support. 

Competitive drug screening assays in wBich neutralizing antibodies capable of specifically 

5 binding the polypeptide of interest compete with a test compound for binding to the polypeptide or 
fragments thereof of interest are also useful according to the invention. According to this method, 
antibodies can be used to detect the presence of any test peptide which shares one or more antigenic 
determinants with the polypeptide of interest. 

An additional technique for drug screening involves the use of host eukaryotic cell lines or 

10 cells (such as described above) which have a gene that produces a defective protein. According to 

this method, the host cell lines or cells are grown in the presence of a test drug compouiid. The rate of 
growth of the host cells is measured to determine if the compound is capable of regulating the growth 
of cells expressing a nonfunctional protein product of a gene. Alternatively, the ability of the test 
compound to restore the function of the mutant gene protein can be measured by usuig an appropriate 

15 in vitro assay for function of the protein product of a gene. Suitable in vitro functional assays are 

described in Section F entitled 'Identification and Characterization of Polymorphisms". If the host ceD 
lines or cells express a protein product of a gene that exhibits an aberrant pattern of cellular 
localization, the ability of the test compound to alter the cellular localization of the protein will be 
determined. Changes in the cellular localization of a protein of interest will be detected by performing 

20 cellular firactionation studies with biosyntheticaHy labeled cells. Aitematively, the celular localization of 
a protein of interest can be determined by inmiunocytochemical methods well known in the art. 

A method of drug screening may involve the use of host , eukaryotic cell lines or cells 
(described above) which have an altered gene that demonstrates an aberrant pattern of expression. 
By aberrant pattern of expression is meant the level of expression is either abnormally high or low, or 

25 the temporal pattern of expression is different firom that of the wild type gene. The ability of a test 

drug to alter the expression of a mutant form of a gene can be measured by Northern blot analysis, S 1 
nuclease analysis, primer extension or RNase protection assays. Aitematively, if a mutant form of a 
gene contains an polymorphisms in the promoter region of a gene, ceDs can be engineered to express a 
reporter construct compriang a mutant gene promoter driving expression of a reporter gene (e.g. 

30 CAT, luciferase, green fluorescent protein). These cells can be grown in the presence of a test 

compound and the ability of a test compound to alter the level of activity of the mutant gene promoter 
can be determined by standard assays for each reporter gene which are well known in the art. 
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Candidate Drug s 

A "candidate drag" as used herein, is any compound with a potential to modulate a phenotype 
associated with a particular disease according to the invention. 

A candidate drug is tested in a concentration range that depends upon the molecular weight of 
5 the drug and fhie type of assay. For example, for inhibition of protein/protein complex formation, small 
molecules (as defined below) may be tested in a concentration range of 1 pg - 100 mg/ml, preferably at 
about 100 pg - 10 ng/ml; large molecules, e.g., peptides, may be tested in the range of 10 ng - 100 
mg/ml, preferably 100 ng - 10 mg/ml 

Candidate drag compounds from large libraries of synthetic or natural compounds can be 
10 screened. Numerous means are currently used for random and directed synthesis of saccharide, 

peptide, and nucleic acid based compounds. Synthetic compound libraries are commercially available 
from a number of companies includuig Maybridge Chemical Co. (TreviUet, Cornwall, UK), Comgenex 
(Princeton, NJ), Brandon Associates (Merrimack, NH), and Microsource (New Milford, CT). A rare 
chemical library is available from Aldrich (Milwaukee, WI). Combinatorial libraries are available and 
15 can be prepared. Alternatively, libraries of natural compounds in the form of bacterial, fungal, plant 
and animal extracts are available from e.g. , Pian Laboratories (Bothell, WA) or MycoSearch (NC), or 
are readily produceable by methods well known in the art. Additionally, natural and synthetically 
produced libraries and compounds are readily modified through conventional chemical, physical, and 
biochemical means. 

20 Useful compounds may be found withm numerous chemical classes, though typically they are 

organic compounds, and preferably small organic compounds. Small organic compounds have a 
molecular weight of more than 50 yet less than about 2,500 daltons, preferably less than about 750 
daltons, more preferably less than about 350 daltons. Exemplary classes include heterocycles, 
peptides, saccharides, steroids, and the like. The compounds maybe modified to enhance efficacy, 

25 stability, pharmaceutical compatibility, and the like. Stractural identification of an agent may be used to 
identify, generate, or screen additional agents. For example, where peptide agents are identified, they 
may be modified in a variety of ways to enhance their stability, such as usmg an unnatural amino acid, 
such as a D-amino add, particularly D-alanine, by functionalizing the amino or carboxyUc termmus, 
e.g. for the amino group, acylation or alkylation, and for the carboxyl group, esterification or 

30 amidification, or the like. 

Determination of Activity of a Drag 

A candidate drag, assayed according to the invention as described above, is determined to be 
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effective if its use results in a change of about 10% of a phenotype associated with a disease 
according to llie invention. 

The level of modulation by a candidate modulator of a phenotype associated with a disease 
according to the invention, may be quantified using any acceptable limits, for example, via the 
5 following formula, which describes detections performed with a radioactively labeled probe (e.g., a 
radiolabeled antibody ia an immunobinding experiment or a radiolabeled nucleic acid probe in a 
Northern hybridization). 

(CPMco:^-CPMs^ 

10 Percent Modulation = — '- -xlOO 

where CPMco„troi is the average of the cpm in antibody/ligand complexes or on Northern blots 
resulting from assa}^ that lack the candidate modulator (in other words, untreated controls), and 
15 CPMsa^ is the cpm in antibody/ligand complexes or on Northern blots resulting from assays 

containing the czmdidate modulator. A similar calculation is performed where the assay comprises use 
of a labeling system or system of measuring enzymatic activity in which there is a linear relationship ' 
between the amount of label detected and the amount of protein or nucleic acid being represented per 
unit of label or the amount of protein or nucleic acid represented by a unit of enzymatic activity. 

20 

4. Rational Drug Design 

Rational drug design is useful for producing either structural analogs of biologically active 
polypeptides of interest or small molecules with which polypeptides of interest interact (e.g., agonists, 
antagonists, inhibitors) in order to design drugs which are, for example, more active or stable forms of 

25 the polypeptide, or which enhance or interfere with the function of a polypeptide in vivo. See, e.g., 
Hodgson, 199 1, BioTechnology, 9 :19. According to one method of rational drug design, the tiiree- 
dimensional structure of a protein of interest (e.g., the polypeptide product of the gene) or, or the 
complex comprising the protein product of a gene in association with its Hgand, is determined by x-ray 
crystallography, by computer modeling or most typically, by a combination of approaches. 

30 Alternatively, useful information regarding the structure of a polypeptide may be obtained by modeling 
based on the stracture of homologous proteins. Rational drug design has been used successfully in the 
development of HfV protease inhibitors (Ericlcson et al., 1990, Science, 249: 527). 

Rational drug design may also involve the analysis of peptides derived from flie protein 
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product of a gene by an alanine scan (Wells, 1991, Methods in EnzymoL, 202: 390). According to this 
method, each of the amino acid residues of the peptide is sequentially replaced by alanine, and the 
effect of this amino acid substitution on the peptide's activity is determined. This technique can be 
used to deteiinine the ftmctionally relevant regions of the peptide. 

5 Another experimental approach to rational drug design will involve the isolation of a target- 

specific antibody (selected by a functional assay) and tlie determination of the crystal structure of this 
antibody. Theoretically, this approacb will yield a pliarmacore upon which subsequent drug design can 
be based. Alternatively, if anti-idiotypic antibodies (anti-ids) speci&c for a functional, 
pharmacologically active antibody are generated, there is no need to determine the crystallographic 

10 structure of the target-specific antibody. It is expected that the binding site of the anti-ids wiH be an 
analog of the original receptor. The anti-id could then be used to identify and isolate potentially 
therapeutic peptides from banks of chemically or biologically produced banks of peptides. These 
selected peptides would then function as pharmacores. 

According to these methods it may be possible to design drugs which demonstrate increased 

15 activity or stability of the protein product of a gene or which function as inhibitors, agonists, 
antagonists, etc. of the activity of a protein product of a gene. The availability of cloned gene 
sequences, inctuding polymorphisms, ensures that sufficient amounts of the polypeptide product of a ^ 
gene are available to facilitate analytical studies such as x-ray crystallography. Furthermore, the 
knowledge of the sequence of the protein product of a gene provided herein will guide those using 

20 computer modeling techniques in place of, or in addition to x-ray crystallography. 

5. Gene Therapy 

The present invention also provides a method of supplying wild-type gene fimction to a cell 
which carries a mutant allele of a gene. By replacing a mutant gene with a wild type gene, it may be 

25 possible to reverse the symptoms of osteoarthritis in the recipient cells, a fall length version of the 
wild-type gene, or a fragment of the gene, may be introduced into the cell in a vector such that the 
gene remains extrachromosomal and is expressed by the cell from the extrachromosomal location. 
More preferably, following introduction into flie mutant cell, the wild-type gene or gene fragment 
should recombine with the endogenous mutant gene X already present in the cell. Such recombination 

30 requires a double recombination event which results in the correction of the gene mutation. Vectors 
for introduction of genes both for recombination and for extrachromosomal maintenance are known in 
the art, and any suitable vector may be used. Methods for iatroducing DNA into cells such as 
electroporation, calciirai phosphate coprecipitation and lipofection are known in the art (described 

125 



BNSDOCID; <WO___03054ie6A2_l_> 



wo 03/054166 



PCT/US02/4i225 



above). Cells transformed wifli the wild-type gene can be used as model systems to study changes in 
the intensity of symptoms associated with osteoarthritis and drug treatments which promote such 
changes. 

As generally discussed above, a gene or a fragment thereof, where applicable, may be used in 
5 gene therapy methods in order to increase the amount of the expression products of such genes in 
cells of patients with osteoarthritis. It may also- be useful to increase the level of expression of a gene 
even in those cells in which the mutant gene is expressed at a "normal" level, but llie gene product is 
not fully functional. 

It other embodiments of the invention it may be useful to increase the amount of the 
10 expression products of a mutant form of a gene in a cell that expresses the wild type protein. Gene 
therapy can be carried out according to generally accepted methods, for example, as described by 
Friedman, 1991, In Therapy for Genetic Diseases; T. Friedman ed., Oxford University Press, pp. 105- 
121). InitiaUy, the appropriate cells from a patient with osteoarthritis would be analyzed by the 
diagnostic methods described above, to determine the level of production of a polypeptide from a gene 
15 and flie activity of a polypeptide product of a gene. A virus or plasmid vector (see further details 
below), comprising a copy of a gene and suitable expression contiol elements, and capable of 
replicating inside the cells, wiU be prepared. Suitable vectors are known and are disclosed in U.S. Pat. . 
No. 5,252,479 and PCT published application WO 93/07282. The vector wiU be injected into the 
patient, either locally at an appropriate site according to the invention or systeinicaUy. 
20 Gene transfer systems known in the art may be useful in the practice of the gene therapy 

metiiods of the present invention. These include viral and nonviral transfer methods, a number of 
viruses have been used as gene transfer vectors, including papovaviruses, e.g., 5V40 (Madzak et al., 
1992, J Gen Virol., 73:1533), adenovirus (Berkner, 1992, Curr. Top. Microbiol. Immunol., 158:39; 
Berkner et al., 1988, BioTechniques, 6:616; Gorziglia and Kapikian, 1992, J Virol., 66:4407; Q^antin et 
25 aL, 1992, Proc. NatL Acad. Sci. USA, 89:2581; Rosenfeld et al., 1992, CeU, 68:143 ; Wilkinson et al., 
1992, Nucleic Acids Res. 20:2233; Stratford-Perricaudet et aL, 1990, Hum. Gene Ther., 1.241), 
vaccinia virus (Moss, 1992, Curr. Top. Microbiol. Immnnol., 158:25) adeno-associated virus 
(Muzyczka, 1992, Curr. Top. Microbiol. ImmunoL, 158:97; Ohi et al., 1990, Gene, 89:279), 
herpesviruses including HSV and EBV (Margolskee, 1992, Curr. Top. Microbiol. Immunol, 158:67, 
30 Johnson et al., 1992, J. Virol., 66:2952; Fink et al., 1992, Hum. Gene Ther., 3:11; Breakfield and Geller, 
1987, Mol. Neurobiol., 1:337; Freese et al., 1990, Biochem. Pharmacol, 40: 2189), and retroviruses of 
avian (Brandyopadhyay and Temin, 1984, Mol. Cell Biol., 4.749; Petixjpoulos et al., 1992, J. Virol., 
66:3391), marine (Miller, 1992, Curr. Top. Microbiol. ImmunoL, 158:1; Miller et al., 1985, Mol. CeU. 

126 



BNSDOCID: <WO ^030541 66A2J_> 



wo 03/054166 



PCT/US02/41225 



Biol, 5:431; Sorge et al, 1984, Mol. Cell. Biol, 4:1730; Mann and Baltimore, 1985, J. Virol., 54:401; 
Mffler et al, 1988, J. Virol., 62:4337), and human origin (Shimada et aL, 1991, J. Clin. Invest, 88:1043); 
Helseith et al., 1990, J. Virol., 64:24 16; Page et al., 1990, J. ViroL, 64: 5370; Buchschacher and 
Panganiban, 1992, J. Virol., 66:273 1). Most human gene therapy protocols have been based on 
5 disabled murine retroviruses; 

Nonviral gene transfer methods known in the art include chemical techniques such as calcium 
phosphate coprecipitation (Graham and van der Eb, 1973, Virology, 52:456; PeHicer et al., 1980, 
Science, 209:1414); mechanical techniques, for example microinjection (Anderson et aL, 1980, Proc. 
Natl. Acad. Sci. USA, 77: 5399; Gordon et al., 1980, Proc. Natl. Acad. Sci.. USA, 77: 7380; Brinster 
10 et aL, 1981, Cell, 27:223; Constantini and Lacy, 1981, Nahire, 294:92); membrane fusion-mediated 
transfer via liposomes (Feigner et al., 1987, Proc. Natl Acad. Sci. USA, 84:7413; Wang and Huang, 

1989, Biochemistry, 28:9508; Kaneda et aL 1989, J. BioL Chem., 264:12126; Stewart et al., 1992, 
Hum. Gen. Ther., 3:267; Nabel et al., 1990, Science, 249:1285; Lun et aL, 1992, Circulation, 83:2007); 
and direct DNA uptake and receptor-mediated DNA transfer (WoUf et al., 1990, Science, 247:1465; 

15 Wu et al., 1991, J. BioL Chem., 266:14338; Zenke et aL, 1990, Proc. NatL Acad. Sd. USA, 87:3655; 
Wu et al.. 1989b, J. BioL Chem., 264:16985; Wolff et al., 1991. BioTechniques, 11:474; Wagner et aL, 

1990. Proc. NatL Acad. ScLUSA. 87:3410; Wagner et al., 1991, Proc. Natl. Acad, ScLUSA, 88:4255; 
Gotten et aL, 1990, Proc. NatL Acad. Sci.USA, 87:4033 ; Curiel et al., 1991a, Proc. Nafl. Acad. 
ScLUSA, 88:8850; Curiel et al., 1991b, Hum. Gene Ther., 3:147. 

20 In an approach which combines biological and physical gene transfer methods, plasmid DNA 

of any size is combined with a polylysine-conjugated antibody specific to the adenovirus hexon protem, 
and the resulting complex is bound to an adenovirus vector. The trimolecular complex is then used to 
infect cells. The adenovirus vector permits efficient bmding, internalization, and degradation of the 
endosome before the coupled DNA is damaged. 

25 Liposome/DNA complexes have been shown to be capable of mediating direct in vivo gene 

transfer. Wliile in standard Hposome preparations the gene transfer process is nonspecific, localized in 
vivo uptake and expression have been reported in tumor deposits, for example, following direct in situ 
administration (Nabel, 1992, Hum. Gen. Ther., 3:399). 

Gene transfer techniques which target DNA directly to an appropriate tissue, e.g., a tissue 

30 that normally expresses the protein product of the candidate gene of the invention, is preferred. 

Receptor-mediated gene transfer, for example, is accompHshed by the conjugation of DNA (usually in 
the form of covalently closed supercoiled plasmid) to a protein Hgand via polylysine. ligands are 
chosen on the basis of the presence of the correspondmg Hgand receptors on the cell surface of the 
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target cell/tissue type. These ligand-DNA conjugates can be injected directly into the blood if desired 
and are directed to the target tissue where receptor binding and internalization of the DNA-protein 
complex occurs. To overcome the problem of intracellular destruction of DNA, coinfection with 
, adenovims can be included to disrupt endosome function. 

5 

6. Peptide Therapy 

Peptides which have gene activity can be supplied to cells which carry mutant or missing 
alleles of a gene. Alternatively, peptides specific for a mutant form of the protein product of a gene 
can be supplied to cells carrying a wild type protein. The protein product of a gene can be produced by 

10 expression of the cDNA sequence in bacteria, for example, using known expression vectors (as 

described in Sfection H entitled "Production of a Mutant Protein"). Alternatively, the protein product of 
a gene can be extracted from mammalian cells engineered to produce the protein product of a gene of 
interest. In addition, the techniques of synthetic chemistry can be employed to synthesize the protein 
product of a gene. Any of the above techniques can provide a preparation of protein product of a gene 

15 that is substantiaUy free of other human proteins. This is most readily accomplished by carryiog out 
. protein synthesis in a microorganism or in vitro. 

Active gene molecules can be introduced into cells by microitrjection or by the use of 
liposomes, for example. Alternatively, some active molecules may be taken up by cells, actively or by 
diffusion. Extracellular application of the protein product of a gene may be sufficient to decrease or 

20 reverse the physiological effects of osteoarthritis. Other molecules with the activity of a protein 
product of a gene (for example, peptides, drugs or organic connpoimds) may also be used to effect 
such a reversal. Modified polypeptides having substantially similar function may also be useful for 
peptide therapy. 

25 7. Transformed Hosts 

Cells and animals which carry a mutant allele of a gene can be used as model systems to 
'study and test for substances which have potential as therapeutic agents. Following application of a 
test substance to the cells, the phenotype of the cell wiUbe determined. Any variety of phenotypic 
changes associated with osteoarthritis can be assessed, includiap insulin resistance and combined 
30 insulin resistance/insulin secretion detect. Assays for each of these traits are known in the art. 

Animals useful for testing therapeutic agents can be selected after mutagenesis of whole 
animals or after treatment of germline cells or zygotes. Such treatments include insertion of mutant 
alleles of a gene, usually from a second animal species, as well as insertion of dismpted homologous 
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genes. Alternatively, the endogenous gene of the animals maybe disrupted by insertion or deletion 
mutation or other genetic alterations using conventional techniques (Capecchi, 1989, Science, 
244:1288; Valancins and Smithies, 1991, Mol. CelL Biol., 11:1402; Hasty et al,, 1991, Nature, 350:243; 
Shinkai et al., 1992, Cefl, 68:855; Mombaerts et al., 1992, Cell, 68:869; Philpott et al., 1992, Science, 
5 256:1448; Snonwaert et aL, 1992, Science, 257:1083; Donehower et al., 1992, Nature, 356;215). 

Following the administration of test substances, the physiological changes associated with osteoarthritis 
win be assessed. If the test substance prevents or suppresses any of these physiological changes, then 
the test substance wDlbe considered a candidate therapeutic agent for the treatment of osteoartibritis. 
These animal models provide an extremely important testing vehicle for potential therapeutic products, 

10 

8. Use of a Polynucleotide as a Unique Sequence Marker: 

Polynucleotides can be used to mark objects or substances for the purposes of later 
identification. Thus, polynucleotides of the invention are useful for tracking the manufacture and 
distribution of a large number of diverse substances, including but not limited to: (1) natural resources 

15 such as animals, plants, oil, minerals, and water; (2) chemicals such as drugs, solvents, petroleum 

products, and explosives; (3) commercial by-products including pollutants such as radioactive or other 
hazardous waste; and (4) articles of manufacture such as guns, typewriters, automobiles and 
automobile parts. A nucleic acid according to the invention, when used as a marker, thus aids in the 
determmation of product identity and so provides information useful to manufacturers and consumers . 

20 Polynucleotides have the advantage over other marking materials of beiag readily amplifiable 

through the use of polymerase chain reaction (PGR) technology. The method of PGR is weU known in 
the art. PGR is performed as described by MuDis & Faloona, 1987, Methods Eazymol, 155:335, herein 
incorporated by reference. It is the unique sequence of a polynucleotide which renders it useful as a 
marker, since thesequence, or a characteristic pattern derived from its sequence, confers a property 

25 on the polynucleotide which permits it to be tracked. 

It is contemplated that a novel polynucleotide sequence of the invention, or fragments or 
derivatives of it may be used as markers by their attachment to or mixture in objects or substances to 
be marked. Methods for marking various classes of substances and later detection of flie tags in those 
substances are disclosed in U.S. Patent Nos. 5,451,505, and 5,643,728. 

30 Briefly, the use of a polynucleotide of the invention as a marker may entail combining a 

polynucleotide with the substance or object to be marked, using methods appropriate to that substance 
or object; and detecting the marker through amplification of the polynucleotide sequence using PGR 
technology, foUowed by either sequence analysis or identification by other means known in the art 
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(e.g., hybridization assays). " 

The methods of applying a marker nucleic acid to a substance or object and subsequent 
detection of that nucleic acid wiU vary depending upon the nature of the substemce or object and the 
environment to which it will be exposed. For example, inert solids such as paper, many pharmaceutical 
5 products, wood, some foodstuffs, etc., can be either processed with the marker nucleic acid, or tlae 
nucleic acid may be sprayed onto their surfaces. Chemically active substances, such as foodstuffs 
with enzymatic activity, polymers with charged groups, or acidic pharmaceuticals may require that a 
protective composition (e.g., liposomes) be added to the nncleic acid being used as a marker. 

In order to mark liquids, the nucleic acid may be mixed directly with the liquid, or, if the 
10 chemical nature of the liquid is not compatible with this approach (Le., nucleic acids are not soluble in 
the liquid), the nucleic acid maybe mixed with a detergent to enhance its solubility. Containerized 
gases may be marked simply by adding a nucleic acid to the container in dry form, as it wiU be 
dispersed throughout Ihe gas as 'the gas is released. 

The amount of nucleic acid to add to a substance as a marker will also vary with the given 
15 situation, as will the detection strategy. PCR technology, however, allows the amplification and 

detection of as little as one molecule from a sample. Other means of detection, such as hybridization . 
assays require that more nucleic acid be recovered from a sample to efficiently detect it. PGR can be 
combined with a hybridization assay, however, to enhance the sensitivity of the method. 

A nucleic acid sequence used as a marker will generally be from 20 to 1,000 bases long, and 
20 preferably will be 60 to 1 ,000 bases long when PC 'U. is to be used to detect the marker. 

One example of a substance for which nucieic acid marldngis suited is gunpowder. Marked 
gunpowder maybe prepared as follows: 1) add 16 ng of nucleic acid bearing the chosen marker 
sequence (derived from a polynucleotide of the invention) to 1 ml of distilled water; 2) mix the solution 
of nucleic acid with 1 g of nitrocellulose-based gunpowder; and 3) dry in air or under vacuum at B5°C. 
25 To recover the marker from gunpowder: 1) wash the gunpowder sample with 1 ml of distilled water; 
2) add 50 ml of the wash solution to a standard PCR mix, or, alternatively, place gunpowder flakes 
directly into a 100 ml PCR mix; and 3) amplify according to standard PCR methods using primers 
which anneal at opposite ends and on opposite strands of the sequence used as a marker (annealing 
and extension conditions will depend upon the exact sequences chosen for oligonucletide primers, and 
30 maybe adjusted according to methods known in the art). 

Another example of a substance which may be marked with a nucleic acid according to the 
invention is ink. To prepare marked ink sample: 1) if the ink is water insoluble, mix the nucleic acid 
with detergents as for oiL If the ink is water soluble, add nucleic acid directly to the ink to a 
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concentration of about l.to 20 ng per ml. To recover the marker from ink, proceed as for oils and 
medicines. - . 

In the above examples, the presence of an amplification product of the proper size (visualized, 
for example by gel electrophoresis alongside nucleic acid size markers followed by ethidium bromide 
5 staining of the gel, according to standard methods) will indicate the presence of the marker in the 
sample. In some instances, the PCR product may be further subjected to hybridization analysis or to 
sequencing to enhance the accuracy of the method. A method of bybridization analysis which can be 
used is described herein. 

10 9. Use of a Polynucleotide of the Invention as a Marker for Chromosome Mapping: 

Because a polynucleotide of the invention is novel, (that is, its sequence is unique), it is useful 
as a marker for chromosomal mapping. There are a number of methods of chromosomal mappmg 
known in the art. Promiaent among them is the variant of the in situ hybridization technique known as 
"Fluorescence In Situ Hybridization", or PISH. Details of methods and solutions used for in situ 

15 hybridization are wen-lmown in the art There are many variations of the FISH tecluuqueitsetf^ 

however the basic approach is similar in each case. Essentially, in situ hybridization of cells, nuclei, or 
metaphase chromosome spreads is performed with a polynucleotide probe either directly labeled with' 
a fluorochrome, or labeled with a moiety which will be bound by a fluorochrome tagged entity. The 
hybridized probe is visualized by irradiation of the sample with light in the wavelength which excites 

20 fluorescence from the fluorochrome. When combined with standard methods of karyotyping known in 
the art, this method allows the polynucleotide sequence to be localized to a particular arm of a 
particular chromosome. Once mapped to a specific chromosome, the location of the novel 
polynucleotide sequence on that chromosome may be further localized by in situ hybridization along 
with probes specific for known genes or sequences, labeled with other fluorescent tags which allow 

25 the differentiation of the signals jfrom the different probes. Such an approach and various adaptations 
of it allows the localization of the novel gene relative to a known gene. Methods of generating and 
using fluorescence-labeled polynucleotide probes for FISH and chromosome mapping are known in 
the art (for example, see Malcolm et aL, 1981, Ann. Hum. Genet., 45:134; Bar-Am et al., 1992, 
Genes. Chromosomes & Cancer, 4:3 14; Pihkel et al., 1988, Proc. Natl. Acad. Sci. USA, 85:9138; 

30 U.S. Patent No. 5,728,527). Additional variations of the chromosome mapping method utilize a PCR 
approach (Dionne et aL, 1990, BioTechniques, 8(2):190 and Iggo et al., 1989, Proc. Natl. Acad. Sci. 
USA, 86:6211). 

In addition to being able to determine the chromosomal location of the novel polynucleotide, 
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similar technology, in which FISH is combined with flow cytometry, will allow the polynucleotide of 
the invention to be nsed to sort chromosomes, nuclei, or whole cells containing various dosages (i.e., 
gene copy numbers) of the gene encoding that polynucleotide (Hulfdin et al., 1998, Nuc. Acids Res., 
26:3651). The novel polypeptide may also be useful as a diagnostic indicator of a disease, including but 
5 not limited to those listed in Table I (Kuo et al., 1990, Am. J. Hum. Genet, 47:A119). 

10. Use of a Polynucleotide of the Invention as a Marker for Analysis of Forensic 
Materials 

Forensic science depends heavily on methods for determining the sotirce of various 
10 compounds associated with criminal activity. In particular, the identification of individuals involved ia 
criminal activity through analysis of substances found at the crime scenes is critical. Such identification 
is possible with genetic typing, which involves the determination of the genotype of an individual with 
regard to loci which are polymorphic within the population. As used herein, '^polymorphic" refers to a 
gene or other segment of DNA which shows nucleotide sequence variability from individual to 
15 individual. The use of PGR techniques and nucleotide probes to detect even single nucleotide changes 
in a polynucleotide sequence has revolutionized the field of forensic serology (see Reynolds and 
Sensabaugh, 1991, Anal. Chem., 63:2). For an example of polymorphisms useful for forensic 
identification and methods of typing samples with regard to those polymorphisms, see U.S. Patent # 
5,273,883. 

20 If a polynucleotide of the invention is found to have nucleotide sequence variation among 

individuals within a population, it maybe useful in the analysis of forensic samples. There are a 
number of methods known to those skilled in the art for typing nucleic acids with regard to 
polymorphisms. It should be understood that any such method is acceptable according to the invention. 
One particular method is termed the "reverse dot blot" method. The basic steps involved are: 1) 

25 oligonucleotides bearing the sequences of various polymorphic forms of the polynucleotide region to be 
analyzed are bound to membranes; 2) labeled, PCR-amplified firagments, derived from the sample to 
be genotyped, and corresponding to the polymorphic region ("target DNA") are allowed to hybridize to 
flie bound oligonucleotides under conditions which only allow the hybridization of molecules with 100% 
complementary sequences; 3) unbound target DNA is removed; and 4) hybridized molecules are 

30 detected. 

The specific genotype of the individual firom whom the target sample was obtained 
(amplified), with regard to the polymorphic region of a polynucleotide of the invention, may thus be 
detemnined by screening a panel of probes containing the known polymorphic sequence variations of 
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that region. It should be understood that the hybridization conditions may be adjusted by one of skill in 
the art so that limited amounts of non-complementarity, including single base mismatches, may be 
detected with this method. 

5 Q. Pharmaceutical Compositions-Prevention and Treatment 

1. Administration of Pharmaceutical Compositions 

AdministrationL of pharmaceutical compositions is accomplished orally or parenterally. 
Methods of parenteral delivery include topical, intra-arteiial (directly to the tumor), intramuscular, 

10 subcutaneous, intramedullary, intrathecal, intraventricular, intravenous, intraperitoneal, or intranasal 
administration. In addition to the active ingredients, these pharmaceutical compositions may contain 
suitable pharmaceutically acceptable carrier preparations which can be used pharmaceuticaUy. 

Pharmaceutical compositions for oral administration can be formulated using pharmaceutically 
acceptable carriers vsreU known in the art in dosages suitable for oral administration. Such carriers 

15 enable the pharmaceutical compositions to be formulated as tablets, pills, dragees, capsules, liquids, 
gels, syrups, slurries, suspensions and the like, for ingestion by the patient. 

Pharmaceutical preparations for oral use can be obtained through combination of active 
compounds with solid excipient, optionally grinding a resulting mixture, and processing tlie mixture of 
granules, after adding suitable auxiliaries, if desired, to obtain tablets or dragee cores. Suitable 

20 excipients are carbohydrate or protein fillers such as sugars, including lactose, sucrose, mannitol, or 
sorbitol; starch from com, wheat, rice, potato, or other plants; cellulose such as methyl cellulose, 
hydrojg^ropylmethyl-cellulose, or sodium carboxymethyl cellulose; and gums induding arable and 
tragacanth; and proteins such as gelatin iand collagen. If desired, disintegrating or solubili2ung agents 
maybe added, such as the cross-linked polyvinyl pyrrolidone, agar, alginic acid, or a salt thereof, such 

25 as sodium alginate. 

Dragee cores are provided with suitable coatings such as concentrated sugar solutions, which 
may abo contain gum arable, talc, polyvinylpyrrolidone, carbopol gel, polyethylene glycol, and/or 
titanium dioxide, lacquer solutions, and suitable organic solvents or solvent mixtures. Dyestuffs or 
pigments may be added to the tablets or dragee coatings for product idlentification or to characterize 

30 the quantity of active compound, ie, dosage. 

Pharmaceutical preparations which can be used orally include push-fit capsules made of 
gelatin, as well as soft, sealed capsules made of gelatin and a coating such as glycerol or sorbitol. 
Push-fit capsules can contain active ingredients mixed with a fiUer or binders such as lactose or 
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starches, lubricants such as talc or magnesium stearate, and, optionally, stabilizers. In soft capsules, 
the active compounds may be dissolved or suspended in suitable liquids, such as fatty oils, liquid 
paraffin, or fiquid polyethylene glycol with or without stabilizers. 

Pharmaceutical fomiulations for parenteral administration include aqueous solutions of active 

5 compounds. For injection, the pharmaceutical compositions of the invention may be formulated in 
aqueous solutions, preferably in physiologically compatible buffers such as Hank's solution. Ringer' 
solution, or physiologically buffered saline. Aqueous injection suspensions may contain substances 
which increase the viscosity of flie suspension, such as sodium carbo^methyl cellulose, sorbitol, or 
dextran. Additionally, suspensions of the active solvents or vehicles include fatty oils such as sesame 

10 oil, or synthetic fatty acid esters, such as efliyl oleate or triglycerides, or liposomes. Optionally, the 
suspension may also contain suitable stabUizers or agents which increase the solubility of the 
, compounds to allow for the preparation of highly concentrated solutions. 

For topical or nasal administration, penetrants appropriate to the particular barrier to be 
permeated or used in the formulation. Such penetrants are generally known in the art 

15 • 

2. Manufacture and Storage 

The pharmaceutical compositions of the present invention may be iuanufactared in a manner 
. that known ia the art, e.g. by means of conventional rnixing, dissolving, granulating, dragee-making, 
levitating, emulsifying, encapsulating, entrapping or lyophilizing processes. 
20 The pharmaceutical composition may be provided as a salt and can be formed with many 

acids, including but not limited to hydrochloric, sulfuric, acetic, lactic, tartaric, malic, succinic, etc... 
Salts tend to be more soluble in aqueous or other protonic solvents that are the corresponding free 
base forms. In other cases, the preferred preparation may be a lyophilized powder hi lmM-50 mM 
histidine, 0.1%-2% sucrose, 2%-l% mannitol at a PhRange of 4.5 to 5.5 that is combined with buffer 
25 prior to use. 

After pharmaceutical compositions comprising a compoimd of the invention formulated in a 
acceptable carrier have been prepared, they can be placed in an appropriate container and labeled for 
treatment of an indicated condition with information including amount, frequency and method of 
administration. 

30 ■ 

3. Therapeutically Effective Dose 

Pharmaceutical compositions suitable for use in the present invention include compositions 
wherein the active ingredients are contained in an effective amount to achieve the intended purpose. 
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The detenniiiation of an effective dose is well wifhin flie capability of those sMQed in the art. 

For any compound, the therapeutically effective dose can be estimated initially either in cell 
culture assays, or in animal models, usually mice, rabbits, dogs, or pigs. The animal model is also used 
to achieve a desirable concentration range and route of administration. Such information can (2ienbe 
5 use to determine useful doses and routes for administration in humans. 

A therapeutically effective dose refers to that amount of protein or its antibodies, antagonists, 
or inhibitors which ameliorate tlie symptoms or conditions. Therapeutic efficacy and toxicity of such 
compoimds can be determined by standard pharmaceutical procedures in cell cultures or experimental 
animals, eg, ED50 (the dose therapeutically effective in 50% of the population) and LD50 (the dose 
10 lethal to 50% of the population). The dose ratio between therapeutic and toxic effects is the 

therapeutic index, and it can be expressed as the ratio, LD50/ED50. Pheirmaceutical compositions 
which exhibit large therapeutic indices are preferred. The data obtained from cell culture assays and 
animals studies is used in foimulatiiig a range of dosage for hvunan use. The dosage of such 
compounds lies preferably within a range of circulating concentrations that include the ED50 wiih 
15 little or no toxicity. Tlie dosage varies within this range depending upon the dosage from employed, 
sensitivity of the patient, and the route of administration. 

The exact dosage is chosen by the. individual physician in view of the patient to be treated. 
Dosage and administration are adjusted to provide sufficient levels of the active moiety or to maintain 
the desired effect Additional factors which may be taken into account include flie severity of the 
20 disease state; age, weight and gender of the patient; diet, time and frequency of administiation, drug 
combination(s), reaction sensitivities, and tolerance/response to therapy. Long acting pharmaceutical 
compositions might be administered every 3 to 4 days, every week, or once every two weeks 
depending on a half-life and clearance rate of the particular formulation. 

Dosage amounts may vary from 0.1 to 100,000 micrograms per person per day, for example, 
25 hig, lOug, lOOug, 500 ug, Img, lOmg, and even up to a total dose of about Ig per person per day, 

depending upon the route of administration. Guidance as to particular dosages and methods of delivery 
is provided m the literature. See U.S. Patent Nos. 4,657,760; 5,206,344; or 5,225,212, hereby 
incorporated by reference. Those skilled in the art wfll employ different formulations for nucleotides 
than for proteins or their i3Dhibitors. Similarly, delivery of polynucleotide or polypeptides will be specific 
30 to particular cells, conditions, locations, etc... 

Without further elaboration, it is believed that one skilled in the art can, using the preceding 
description, utilize the present invention to its fullest extent The following embodiments are, therefore, 
to be construed as merely illusfrative, and not limitative of the remainder of the disclosure in any way 
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whatsoever. 

The disclosures of all patents, applications, and publications mentioned above and below, 
including U.S. Ser. No. 60/342,603, are hereby expressly incorporated by reference. 



5 EXAMPLES 

1. Establishment of an Association Between a Given Polynucleotide Sequence and 
Diabetes . 

A polycoicleotide secpience according to the invention containing a mutation which is believed 

10 to be associated with a disease, can be statistically linked to that disease by linkage analysis. An 

animal model system exhibiting a particular phenotypic defect that is characteristic of the disease of 
interest is selected. A series of genetic crosses is performed in this animal model system between 
individuals having an observable mutant phenotype and normal individuals of a control strain. At least 
one disease-related locus or a chromosonoLal naarker that does not comprise a disease related locus is 

15 used as a marker in these crosses. If a statistically significant pattern of non-random assortment of the 
mutant trait with a marker locus is observed, the trait is linked to the marker locus. 

Similarly, linkage analysis can be performed on an existing human or other maminalian 
pedigree. According to this method, numerous genetic loci from affected and unaffected fairuly 
members are compared. Non-random assortment of a given genetic marker between affected and 

20 unaffected family members relative to the distributions observed for other genetic loci indicates that 
the marker (for example, a variant isoform of a gene) either contributes to the disease or is in physical 
proximity to another that does so. 

If either approach demonstrates a non-random assortment of the disease-related phenotype 
with a marker locus, this is indicative of an association between the gene underlying flie defect and 

25 that locus. Because the strength of any conclusion drawn from linkage analysis is statistically-based, ■ 
tiie accuracy of the results is thought to be proportional to the number of crosses or family members 
and genetic loci analyzed. 

2. Screening Assay For a Disease 

30 A polynucleotide sequence according to the invention can be used as a marker for a normal 

phenotype or for a phenotype associated with a disease of interest. 

If it can be demonstrated by the methods of phenotyping, described above, fliat a particular 
sequence is associated with a disease phenotype, this sequence can be used as a marker for a 
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particular disease. A sequence of interest can be used as a probe to screen genoniic DNA from 
individuals by Southern blot analysis according to the method described above. If (he sequence of 
interest is detected by Southern blot analysis, and the presence of this sequence is confirmed by direct 
sequencing, it can be concluded that the individual from wliich the genomic DNA has been isolated 

5 has an increased frequency for the development of the disease for which the sequence is a marker. 

The marker can also be used as a disease indicator according to the method of PGR. A 
genomic DNA sample of interest can be analyzed in a PGR reaction wherein one of the primers 
contains the marker sequence. If the marker sequence is present in the sample DNA, a PGR product 
win be produced. Alternatively, the PGR primers can be designed such lhat they amplify a region 

10 containing the marker sequence. The amplified product can be analyzed by hybridization methods, 
described above, to determine the presence of the sequence of interest. 

3. Use of a Given Polynucleotide as a Target for Drug Screening 

V. ■ 

A polynucleotide accordiQg to the inveiition, containing a mutation which is believed to be 

15 associated with a disease can be used a target for drag screening. 

One method of drug screening utilizes eukaryotic or procaryotic host cells which are stably 
transformed with a polynucleotide according to the invention and either exhibit a particular phenotype 
characteristic of the presence of the polynucleotide or express a polypeptide or fragment encoded by 
the polynucleotide. Such cells, either in viable or fixed form, can be used for standard competitive 

20 binding assays. In particular, these cells can be used to measure formation of a complex comprising 
the protein product or fragment of a polynucleotide according to the invention and the agent being 
tested. Alternatively, these cells can be used to determine if the formation of a complex between the 
protein product or jfragment of a polynucleotide according to the invention and a known ligand is 
interfered with by an agent being tested. 

25 An alternative method for drug screening involves using of eukaryotic cell lines or cells (such 

as described above) which contain a polynucleotide according to the invention that produces a 
defective protein. According to this method, the host cell lines or cells are grown in the presence of a 
test drag. The rate of growth of the host cells is measured to determine if the compound is capable of 
regulating the growth of cells expressing a nonfunctional protein product of the polynucleotide 

3d according to the invention. Preferably, a drug that is useful according to the invention will increase or 
decrease the growth rate of a ceUby at least 10%. Alternatively, the ability of the test compound to 
restore the function of the mutant gene protein by at least 10 % can be nieasured by using an 
appropriate in vitro assay for function of the protein product of a gene (as described in Section F 
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entitled 'Identificatioii and Characterization of Polymorphisms"). If the host cell lines or ceLs express 
a protein product of a gene that exhibits an aberrant pattern of cellular localization, the ability of the 
test compound to alter the cellular localization of the protein by at least 10% will be determined. 
Changes in the cellular localization of a protein of interest will be detected by performing cellular 

5 fractionation studies with biosynthetically labeled cells. Alternatively, the cellular localization of a 
protein of interest can be determined by immunocytochemical methods well known in the art. 

A method of drug screening may also involve the use of host eukaryotic cell lines or cells 
(described above) which have an altered gene that demonstrates an aberrant pattern of expression. 
By aberreant pattern of expression is meant the level of ejqpression is either abnormally high or low, or 

10 the temporal pattern of expression is different from that of the wild type gene. The ability of a test 
drug to alter the expression of a mutant form of a gene by at least 10% can be measured by Northern 
blot analysis, SI nuclease analysis, primer extension or Rnase protection assays, as described above. 
Altemalively, if a mutant form of a gene contains a polymorphism in the promoter region of a gene, 
ceOs can be engineered to express a reporter construct comprising a mutant gene promoter driving 

15 expression of a reporter gene (e.g. CAT, luciferase, green fluorescent protein). These cells can be 
grown in the presence of a test compound and the abihty of a test compound to alter the level of 
activity of the mutant gene promoter can be determined by standard assays for each reporter gene 
which.are well known in the art. 

A transgenic animal whose genomic DNA contains a polynucleotide associated with a 

20 particular phenotypic defect that is characteristic of the disease of interest, and a normal, control 
anomal (not containing the polynucleotide) can be treated with a candidate drug according to the 
invention. The ability of a candidate drug to ameliorate symptoms of the disease, by at least 10%, will 
be analyzed by assessing the disease syptoms and their amelioration. 

25 4. Selection of Osteoarthritis Candidate Gene Set 

Genes involved in osteoarthritis 

Key pathogenic processes involved in osteoarthritis are: 

30 

1. chondrocyte differentiation, development, apoptosis and signalling 

2. cartilage components and synOiesis : proteoglycans, hyaluronan synthases, extracellular 
matrix molecules 
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3 . cartilage degradation: cafhepsin proteases and matrix metalloproteinases, their inhibitors 

4. bone remodelling signals (e.g. RANK/RANKL): BMPs, TGFbeta, interleuMns, their 
receptors and antagonists, downstream signaling. 

5. synovial fluid components 

5 6. systemic factors influencing bone and cartilage remodelling: leptin, estrogen, progesterone, 
inflamocriatory cytokines, retinoic acid 

Polymorphisms at the following genes have been reported in the literature to be involved with 
increased risk of osteoarthritis. They include components of the extracellular matrix, and bone- 
10 remodelling signalling components (Table 2) 

With the aim of ejqjanding and improving on the current limited knowledge of osteoarthritis 
genetic predisposition, we have collected over 500 candidate bone and cartilage remodelling genes 
using the foUowiog methods: 

15 

1. extensive literature search for genes involved in relevant biochemical pathways and 
physiological processes 

2. analysis and comparisons of cDNA libraries within the Incyte Lifeseq® database from 
20 relevant normal and diseased tissues and in vitro modelling systeros 

3 . co-e3q>ression analysis using Ihcyte's "Guilt by Association" algorithm which identifies novel 
genes in key biochemical pathways by comparing the expression patterns of genes within the 
Lifeseq® database 

25 

5. Polymorphisms in Genes Associated with Osteoarthritis 

The osteoarthritis candidate gene list was compiled using gene or gene sequences selected 
from literature sources, using sequence homology, library subtraction and expression analysis . 

Expression analysis was performed using "quilt-by-association" queries to identify Incyte- 
30 novel and known genes not previously associated with diabetes which have similiar expression 

patterns to genes known to be involved in diabetes or related conditions. Guilt-by-association analysis 
was performed as described in Walker et al. 1999 Genome Res 9:1198; Walker et al. 1999 Ismb :282; 
and US Patent Application 09/226394 entitled 'Tosulm-Synthesis Genes" (Atty Docket No: PB-0008 
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US) filed January 7, 1999, all of which are incorporated by reference. 

Polymorphism discovery was by fSSCP as decribed in section F 'Identication and 
Characterization of Polymorphisms", subsection b5 for polymorphisms referred to in Table 3 for 
source wetSNPs. Polymorphisms referred to as source isSNPs were discovered as described in 
5 section F "Identification and Characterization of Polymorphisms", subsection a. Polymorphisms 

referred to as source dbSNPs are polymorphisms in public genomic sequence where gene structure is 
unknown. The polymorphisms were mapped to cDNA sequences in the LifeSeqGold database 
(Incyte) to identify gene identity. 

10 6. Frequency of Polymorphisms in Diabetes Associated Genes and Polynucleotides in 
Various Populations 

Polymorphisms identified in EXAMPLES 4 and 5 were genotyped against populations 
described below by fSSCP or FP-TDI as described above. The results of the population firquency 
studies are given in Table 2. 

15 Two panels of human DNA have been developed to support th.e identification of frequent 

SNPs witbin an ethnically diverse population. The genomic Human Diversity Panel will be used 
where full genomic structure is available, and allows screening of the open reading frame of the gene, . 
including splice junctions. In instances where genomic structure for selected candidate genes may not 
be available, a cDNA version of the HDP Screening Panel permits screening of the open reading 

20 frame of the gene. 

This DNA panel is derived from 47 consented individuals from four ethnic groups (Caucasian, 
African-American, Asian and Hispanic). ' The panel is sufficiently sized to enable identification of 95 % 
of SNPs with allele population frequencies >= 5%. Comparable utility of our panel with the NIH 
Diversity panel was demonstrated by paraEel screening of 90 Mlobases of coding sequence from each 
25 panel. 

A cDNA counterpart to our Human Diversity Panel has been generated from lymphoblastoid 
cell lines to obviate the need for intron/exon structure in 50% of human genes. In the absence of 
genomic structure, this methodology will be employed to screen the entire open reading frame of the 
gene. 

30 Various modifications and variations of the described compositions, methods, and systems of 

the invention wiU be apparent to those skilled in the art without departing from the scope and spirit of 
the invention. Although the invention has been described in connection with certain embodiments, it 
should be understood that the invention as claimed should not be unduly limited to such specific 
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invention to the precise forms disclosed. Furthermore, elements from one embodiment can be readily 
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TABLE 1 



AACT 

Full name : alpha-l-antichymotrypsin 
Link : AACT_link_cdna 

Subsequence GBiAACT 1 1520 

CDS GB:AACT.l 1302 bp #2 

ORF 12 1313 
Allele GB:AACT 1 36 36 

source isSNP SNP00027203 

consequence GB:AACT.l 2 
Allele GBsAACT 1 269 269 

source isSNP SNP00073834 

consequence GB:AACT.l 2 
Allele GB:AACT 

source 

consequence 
Allele GBiAACT 

source 



1 830 830 

isSNP SNP00047132 
GB:AACT.l 2 
1 836 836 

isSNP SNP00043844 



Missense 
A>G 



Silent 
A>G 



Silent 
A>Q 



consequence GB : AACT . 1 



Allele 



Allele 



Allele 



Allele 



Allele 



GB : AACT 
source 
consequence 
QB : AACT 
source 
consequence 
GB : AACT 



Silent 
A>G 



275-275 



cons equenc e 
QBiAACT 
source 



Missense 
A>G 



Silent 
A>G 



Silent 
G>T 



276-276 



279-279 



281-281 



Stop 
A>G 



312-312 



406-406 



837 837 
isSNP SNP00101207 
GB:AACT.l 2 

848 848 
isSNP SNP00101208 
GB:AACT.l 2 

854 854 
isSNP SNP00052361 
GB:AACT.l 2 

947 947 
isSNP SNP00059862 
consequence QB:AACT.l. 2 
GB:AACT 1 1227 1227 

source isSNP SNP00d46B72 

consequence QB:AACT.l 2 
GIF AACT-cdna-fwd.gif 
Link : FL_2114865_link_genoinic 

Subsequence GB : AL049839_2 1 

Subsequence AACT_in3ma_build.l 
Subsequence AACT_cds . 2 

CDS AACT_cds.2 651 bp 
exon 59542 60184 
67441 67448 
AACT_inma_bui 1 d . : 
59531 60184 
64295 64568 
67441 67591 
68711 69154 
GB.:AL049839_2 

source isSNP SNP00027203 

source wetSNP GB : AL049839_2 . v59566 .G>A 

consequence AACT_cds . 2 5 Missense 9-9 A>T 

GB:AL049839_2 3 59799 59799 A>G 

source isSNP SNP00073a34 

consequence AACT_cds.2 5 Silent 86-86 F 

GB:AL049839_2 3 59844 59844 A>G 
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Missense 



214520 
59531 69154 #4 



exon 

exon 
exon 
exon 
exon 
Allele 



Allele 



Allele 



59542 67448 
exons 



1523 bp 



exons 



59566 ,59566 A>G 
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TABLE 1 (Cont.) 



isSNP SNP00005018 
consequence AACT_cds.2 5 Silent 
GB:AL049839_2 3 60144 60144 A>G 

source isSNP SNP00093217 

consequence AACT_cds.2 5 Silent 
GB:AL049839_2 3 64470 64470 A>G 

source isSNP SNP00047132 

consequence AACT_cds.2 5 Intron 
GB:A1j049839_2 3 64476 64476 A>G 

source isSNP SNP00043844 

consequence AACT_cds . 2 5 Intron 
6B:AL049839_2 3 64477 64477 A>G 

source isSNP SNP00101207 

consequence AACT_cds.2 5 Intron 
C3B:AL049839_2 3 64488 64488 A>G 

source isSNP SNP00101208 

consequence AACT_cds.2 5 Intron 
GB:AL049839_2 3 64494 64494 A>G 

source isSNP SNP00052361 

consequence AACT_cds.2 5 Intron 
GB:AL049839_2 3 65434 65434 A>G 

source isSNP SNP00052361 

consequence AACT_cds.2 5 Intron 
GB:AL049839_2 3 65440 65440 A>G 

source isSNP SNP00101208 

consequence AACT_cds. 2 5 Intron 
GB:AL049839_2 3 65451 65451 A>G 

source isSNP SNP00101207 

consequence AACT_cds.2 5 Intron 
QB:AIj049839_2 3 65452 65452 A>G 

source isSNP SNP00043844 

consequence AACT_cds-2 5 Intron 
GB:AL049839_2 3 65458 65458 A>G 

source isSNP SNP00047132 

consequence AACT_cds.2 5 Intron 
GB:AL049839_2 3 68858 68858 A>G 

source isSNP SNP00046872 

consetjuence AACT_cds.2 
GB:AI.049839_2 3 
source wetSNP 
consequence AACT_cds . 2 
'-genomic- fwd . gi f 



68882 68882 A>G 
GB:AL049839_2 .v68882 . A>G 
5 3' 



ABLl 

Full name : v-abl Abelson murine leukemia viral oncogene homolog : 
Link : ABLl_link_cdna 

Subsequence GB:NM_005157 1 5744 #6 

CDS GB:lSiM_005157.1 3393 bp #7 . 

ORF 365 3757 

Allele GB:NM_005157 6 1916 1916 C>G 

source isSNP SNP00046020 

consequence QB:NM_005157 .1 7 Missens© 51E 

Allele GB:NML_005157 6 2j^6 2716 C>G 
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TABLE 1 (Cont.) 



source isSNP SNP00068702 

conseeauence GB:NM_005157.1 
GB:NM_005157 6 3625 

source isSNP SNP00098956 

consequence GB :NM_005i57 . 1 
GB:NM_005157 . 6 - 3688 

source isSNP SNP00012765 

consequence GB:1QM_005157.1 
GB:NM_005157 6 3894 

source isSNP SNP00046021 

consequence GB :]SIM_0 05157 • 1 
GB:NM_005157 6 4612 

source isSNP SNP00051528 

consequence GB :NM_005157 . 1 
GB:NM;_005157 6 5512 

source isSNP SNP0001276S 

consequence GB:NM_005157 .1 . 
GIF ABLl-cdna-fwd.gif 
Link : ABLl_link_genomic 



Allele 



Allele 



Allele 



Allele 



Allele 



Silent 
A>G 



Silent 
A>G 



Silent 
C>Q 



1087.-1087 



1108-1108 



7 

5512 



Svibsequence 
Subsequence 
Subsequence 
Subsec[uence 
Subsegpaence 
subsequence 
Subsequence 
C3DS ABIil_cds . 1 



exon 73887 73965 

exon 85951 86124 

exon 86688 86983 

exon 94650 94922 

exon 104016 

exon 104747 

exon 106755 

exon 109237 

exon 110890 

exon 111322 

exon 114793 
CDS ABLl_cds . 2 



' ABLl_cds.l 73887 116507 #8 

ABLl_cds.2 29132 116507 #9 

aB:U07561_l 1 35962 #10 

GB:U07563_1 36063 120601 #11 

ABLl_mma_build.l 73506 118495 

ABLl_jnma_build.2 28792 116507 

ABLl_itima_build.3 73724 116507 
3393 bp 11 exons #8 



• #12 
#13 
#14 



exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 



exon 
exon 
exon 
exon 



29132 29267 
85951 86124 
86688 86983 
94650 94922 
104016 
104747 
106755 
109237 
110890 
111322 
114793 



104100 
104924 
106939 
109389 
110979 
111486 
116507 
3450 bp 11 exons 



104100 
104924 
106939 
109389 
110979 
111486 
116507 



ABLl_inma_build.l 5762 bp 
73506 73965 
85951 86124 
86688 86983 
94650 94922 



11 exons 
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TABLE 1 (Cont.) 



exon 


104016 


104100 


exon 


104747 


104924 


exon 


106755 


106939 


exon 


109237 


109389 




110890 


110979 


exon 


111322 


111486 


exon 


114793 


118495 


mBNA 


ABLl_jnma_build.2 3787 bp 11 exons #13 




28792 29267 




exon 


85954 86124 




exon 


86688 86983 




exon 


94650 94922 




exon 


104016 


104100 


exon 


104747 


104924 


exon 


106755 


106939 


exon 


109237 


109389 


exon 


110890 


110979 


exon 


111322 


111486 


exon 


114793 


116507 


mKNA 


ABLl_inma_build.3 3556 bp 11 exons #14 




73724 73965 




exon 


Q a £;i R^C 1 9/1 




exon 


85688 869 83 




exon 


94650 94922 




exon 


104016 


104100 


exon 




104924 


exon 


106755 


106939 


exon 


109237 


109389 


exon 


110890 


110979 


exon 


111322 


111486 


exon 


114793 


116507 


Allele 


QB:U07561_1 


10 29061 29061 A>G 




source 


isSNP SNP00120072 




consequence 


ABIil_cds.l 8 5' 




consequence 


ABI.l_cds.2 9 5' 


Allele 


GB:U07561_1 


10 30837 30837 A>G 




source 


dbSNP gnl)dbSNP|ss642659_allele 




source 


dbSNP gnlldbSNP|ssl045108_allele 




source 


dbSNP gnl|dbSNP|ssl044696_allele 




consequence 


ABLl_cds .1 8 5 ' 




consequence 


ABLl_cds.2 9 • Intron 


Allele 


GB:U07563_1 


11 35864 35864 A>G 




source 


isSNP SNP00048470 




consequence 


ABLl_cds.l 8 5' 




consequence 


ABLi_cds . 2 9 Intron 


Allele 


GB:U07563_1 


11 58876 58876 OG 




source 


wetSNP GB:U07563_1 .V58876 .OG 




consequence 


ABLl_cds . 1 8 Intron 




consequence 


ABLl_cds.2 9 Intron 


Allele 


GB:U07563_1 


11 68640 68640 A>a 




source 


wetSNP GB:U07553_l.v68640.T>C 




consequence 


ABLl__Gds .18 ■ Intron 




consequence 


ABLl_cds.2 9 Intron 


Allele 


GB:U075e3_l 


11 74901 74901 A>G 




source 


wetSNP GB:U07563_l.v74901.A>G 
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TABLE 1 (Cont.) 



consequence 

consequence 

GB:U07563_1 

source 

consequence 

consequence 

GB:U07563_1 

source 



ABLl_cds.l 8 Silent 499-499 

ABLl_cds.2 9 Silent 518-518 

11 75298 75298 OQ 

isSNP SNP00046020 

ABLl_cds.l 8 Missense 518^518 

ABLl_cds.2 9 Missense 537-537 

11 78921 78921 A>G 
wetSNP GB:U07563_1 .v78921 . G>A 



consequence ABLl_cds . 1 
consequence ABLl_cds . 2 



Silent 
Silent 



623-623 
642-642 



GB:U07563. 

source 

consequence 



11 79239 79239 A>6 

wetSNP GB:U07563_l.v79239.G>A 
ABLl_cds.l 8 Silent 729-729" 

consequence ABLl_cds.2 9 Silent 748-748 

GB:U07563_1 11 79404 79404 C>G 
isSNP SNP00068702 

wetSNP QB:U07563_l.v79404.C>G 
ABLl_cds.l 8 Silent 784-784 

ABLl_cds.2 9 Silent 803-803. 

11 . 79657 79657 A>G 
wetSNP GB:U07563_l.v79657.C>T 
ABLl_cds.l 8 Missense 869-869 

ABLl_cds.2 9 Missense 888-888 

11 79750 79750 A>G 

wetSNP GB:U07563_l.v79750.C>T 



source 
source 
consequence 
consequence 
GB:U07563_1 
source 
consequence 
consequence 
QB:U07563_1 
source 
consequenci 



ABLl_cds.l 8 Missense 

consequence ABLl_cds.2 9 Missense 

QB:U07563_1 11 80313 80313 A>G 

source isSNP SNP00098966 

consequence ABlil_cds 

consec[uence ABLl_cds . 2 
GB:U07563_1 



900-900 
919-919 



source 

source 

consequence 

consequence 

GB:U07563_1 

source 

consequence 

consequence 

GB:U07563_1 

source 

consequence 

consequence 

GB:U07563_1 

source 

consequence 

consequence 

GB:U07563_1 

source 

consequence 

consequence 



Silent 1087-1087 
Silent 1106-1106 
11 80376 80376 A>G 

isSNP SNP00012765 

wetSNP GB:U07563_l.v80376.G>A 



ABLl_cds . 1 8 Silent 
ABLl_cds.2 9 Silent 
11 80582 80582 C>G 

isSNP SNP00046021 
ABLl_cds .18 3 ' 

ABi:.l_cds.2 9 3' 
11 81298 81298 A>G 

isSNP SNP00051628 
ABLl_cds.l 8 3' 
ABLl_cds .2 9 3 ' 

11 81806 81806 A>G 

isSNP SNP00012766 
ABLl_cds .1 8 3 ' 

ABLl_cds.2 9 3' 
11 82199 82199 A>G 

isSNP SNP00012768 
ABLl_cds .18 3 ' 

ABLl_cds .2 9 3 ' 



1108-1108 
1127-1127 



A>P 
A>P 



P>S 
P>S 



P>S 
P>S 



GIF ABLl-geno33aic-fwd.gif 



_030S4ie6A2_l_> 



wo 03/054166 
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TABLE 1 (Gont.) 



ADAM9 

Full name : a disintegrin and metalloproteinase domain 9 
Link ; ADAM9_link_cdna 

Subsequence GB:HSU41766 1 3865 



CDS GB:HSU41766 
ORP 79 



79 2538 
6B:HSU41766 
source 
consequence 
GB:HSU41766 
source 
consequence 
GB:HSU41766 
source 
consequence 
GB:HSU41766 
source 
consequence 
GB:HSU41766 
source 
consequence 
QB:HSU41766 



source 
consequence 
GIF ADAM9-cdna-fwd.gif 



2460 bp 

15 462 462 

isSNP SNP00060630 

GB:HSU41766.1 

15 1486 1486 

isSNP S3SnP00l22821 

GB:HSU41766.1 

15 1580 1580 

isSNP SNP00060631 

GB:HSU41766 .1 

15 2845 2845 

isSNP SNP00024957 

GB:HSU41766 .1 

15 3112 3112 

isSNP SNP00122822 

GB:HSU41766 .1 

15 3703 3703 

isSNP SNP00024958 

GB:HSU41766.1 



#15 
#16 



16 
A>G 



16 
G>T 



Missense 



#17 



OQ . 



G>T 



ADAMTSl 

Full neune : a disintegrin-like and metal loprot ease (reprolysin type) with 
thronibospondin type 1 motif, 1 

Link : ADAMTSl_link_cdna • 
Subsequence QB : AFd60152_l 1 

CDS GB:AF06dl52_l.l 2853 bp 

ORF 238 3090 
Allele aB:AP060152_l 17 140 

source isSNP SNP00109009 

consequence QB : AF060152_1 . 1 
Allele GB:AF060152_1 17 282 

source isSNP SNP00071624 

consequence GB : AFO 6 0 1 52_1 . 1 
Allele GB:AF060152_1 17 768 

source isSNP SNP60069180 

consequence GB: AF060152_1 . 1 
Allele aB:AF060152_l 17 865 

source isSNP SNP00069181 

consequence GB : AF060152_1 . 1 
Allele • GB:AF060152_1 17 1686 

source isSNP SNP00033973 

consequence GB : AF060152_1 . 1 
Allele GB:AF060152_1 17 2294 

source isSNP SNP00109010 

consequence GB : AF060152_1 . 1 
Allele GB:AF060152_1 17 2370 

source isSNP SNP0003^74 



3430 
#18 



140 



282 



768 



18 
865 



1686 



2294 



18 

2370 



Silent 
G>T 



Silent 
OG 



Mxssense 
A>G 



Silent 
A>G 



Missense 
A>G 
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TABLE 1 (Cont.) 



consequence GB: AF060152_1 . 1 
Allele QB:AF060152_1 17 2958 

source isSNP SNP0003397 5 

consequence GB:AF060152_1.1 
GIF ADAMTSl-ccana-fwd.gif 



Silent 
A>G 



ADAMTS4 

Full name : a disintegrin-like and metalloprotease (reprolysin type) with 
throiribospondin type 1 motif, 4 
Link : ADAMTS4_link_cdna 

Subsequence GB :NM_005099_1 

CDS GB:NM_005099_1.1 . 2514 bp 

ORF 401 2914 
Allele GB:NM_005099_1 19 

source 



4301 
#20 



2970 2970 A>G 



isSNP SNP00022951 
consequence GB :NM_005099_1 . 1 20 3' 
Allele GB:NM_005099_1 19 3529 3529 A>G 

source dbSNP gnl | dbSNP | ss610462_allele 

consequence GB :NM_005099_1 . 1 20 3' 
Allele GB:NM_005099_1 19 3533 3533 A>G 

source dbSNP gnl [ dbSNP ( ss722414_allele 

source dbSNP gnl j dbSNpj ss999631_allele 

consequence GB :NM_005099_1 . 1 20 3' 
Allele GB:NM_005099_1 19 3855 3855 A>G 

source . dbSNP gnl | dbSNPj ssl298908_allele 

consequence QB:NKL_005099_1 .1 20 3' 
GIF ADAMTS4-cdna-fwd.gif 



AGCl 

Full name : aggrecan 1 v_ 
Link : AQCl_link_cdna 

Subsequence GB:Ht)MAGPRO 1 7137 #21 

CDS GBrHUMAGPRO.l 6951 bp #22 

ORF 61 7011 . . 

Allele GB:HtMAGPRO 21 6495 6495 Q>T 

source isSNP SNP00010327 

consequence GBrHUMAGPRO.l 22 Silent 2145-2145 

GIF AGCl-cdna-fwd.gif 



ANK 

Full' name : human homolog of mouse ank gene 
Link : AKK_f l_link_c<taia 

subsequence FN: 3255641CB1 1 1481 

CDS FN:3255641CB1.1 1338 bp #24 

ORF 106 1443 
Allele FN:3255641CB1 23 258 258 

source isSNP SNP00073561 

consequence FN:3255641CB1.1 24 
Allele FN:3255641CB1 23 1048 1048 

148 



Silent 
OG 
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TABLE 1 (Cont.) 



source isSNP SNP00036339 

consequence FN: 325564iCBl . 1 . 

Allele FN:3255641CB1 23 1106 

source isSNP SNP00075037 

consequence FN: 3255641CB1 . 1 

Allele FN:3255641CB1 23 1373 

source isSNP SNP00045819 

consequence FN: 3255641CB1 . 1 

GIF ANK-cdna-fwd.gif, 
Link : ANK_link_cdna 

Subsequence GB: AF274753_1 1 

CDS GB:AF274753_1.1 1479 bp 
ORF 69 1547 

Allele GB:AF274753_1 . 25 362 

source isSNP SNP00073561 

consequence GB:AF274753_1 .1 

Allele . GB:AF274753_1 25 1152 

source isSNP SNP00036339 

consequence GB : AF274753_1 . 1 

Allele GB:AF274753_1 25 1210 

source isSNP SNP00075037 

consequence GB : AF274753_1 . 1 

Allele GB:AF274753_1 25 1477 

source isSNP SNP00045819 

consequence GB:AF274753_1.1 

GIF ANK-cdna-fwd.gif 
Link : ANK_link_genoinic 



24 Missense 315-315 
1106 A>Q 



24 

1373 A>G 



24 Missense . 423-423 



Subsequence 
Subsequence 
Subsequence 
Subsequence 
Subsequence 



ANK_cds.l 26332 84281 
GBI:AC016575_6_000010 
GB:AC026437_2 706 
ANK_ituma_build .1 308 



ANK_cds . 2 



CDS ANK^cds.l 


1338 bp 


exon 


26332 


26503 


exon 


36882 


37000 


exon 


39535 


39618 


exon 


44240 


44410 


exon 


46173 


46307 


exon 


49517 


49609 


exon 


53557 


•53652 


exon 


78643 


78772 


exon 


81811 


81934 


exon 


82505 


82604 


exon 


84168 


84281 


CDS ANK_cds.2 


1479 bp 


exon 


272 


367 


exon 


26287 


26503 


exon 


36882 


37000 




39535 


39618 


exon 


44240 


44410 


exon 


46173 


46307 




49517 


49609 


exon 


53557 


53652 


exon 


78643 


78772 


exon 


81811 


81934 



272 84281 
11 exons 



1568 #25 
#26 



26 Silent 
,1152 OG 



26 Missense 
1210 A>G 



26 Missense 
1477 A>G 



Missense 



#27 

1 605 
92528 #29 
85658 #30 
#31 
#27 
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TABLE 1 (Cont.) 



exon 
exon 

itiRNA 

exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
Allele 



82505 82604 
84168 84281 

ANK_inma_build.l 2820 bp 12 exons #30 

308 367 

26287 26503 

36882 37000 

39535 39618 

44240 44410 

46173 46307 

49517 49609 

53557 53652 

78643 78772 

81811 81934 

82505 82604 

84168 85658 

GB:AC026437_2 29 8413 8413 OG 

source dbSNP gnl | dbSNP| ss95678_allele 

consequence ANK_cds .1 27 5 ' 
consequence ANK_cds.2 31 Intron 
GB:AC026437_2. 29 14825 14825 A>G 

dbSNP gnl | dbSNP | ss619053_allele 
dbSNP gnl |dbSNP| ssl002004_allele 
dbSNP gnl 1 dbSNpj ss227983_allele 
dbSNP gnl|dbSNP|ss324626_allele 
ANK_cds.l 27 5' 

31 Intron 
25779 25779 A>G 
GB:AC026437_2 .v25779 .OT 
27 Silent 51-51 2 

31 Silent 9 8-98 i 

_2 29 25807 25807 A>G 

source isSNP SNP00104502 

source wetSNP GB: AC026437_2 .v25807 .Q>A 

consequence ANIK-cds.l 27 Intron 
consequence ANK_cds.2 31 Intron 
GB:AC026437_2 29 26433 26433 A>G 

source isSNP SNP00018441 

consequence ANK^cds.l 27 Intron 
consequence ANK^cds.2 31 Intron 
GB:AC026437_2 29 30696 30696 A>T 

source dbSNP gnl j dbSNP | ssl016631_allele 

source dbSNP gnl | dbSNP| ss389763_allele 

consequence ANK_cds.l 27 Intiron 
consequence ANK_cds.2 31 Intron 
GB:AC026437_2 29 34277 34277 A>G 

source isSNP SNP00101566 



source 
source 
source 
source 
consequence 
consequence 
GB:AC026437_2 . 2 
source wetSNP 
ponsequence ANK_cds 
consequence ANK^cds 
GB:AC026437_2 2 



AlS!K_cds . 2 



consequence ANK_cds . 1 
consequence ANK_cds . 2 
GB:AC026437_2 29 
source wetSNP 
consequence AlsrK_cds . 1 
consequence ANK_cds . 2 
GB:AC026437_2 29 
source isSNP SNP00056800 

consequence ANK^cds . 1 



27 Intron 
31 Intron 
36172 36172 A>G 
GB: AC026437_2 .v36172 .T>C 
27 Intron 
31 Intron 
37028 37028 G>T 



Intron 
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31 Intron 
37186 37186 G>T 



Intron 
Intron 



consequence ANK_cds . 2 
GB:AC02 6437_2 29 . 

source isSNP SNP00022144 

consequence ANK_cds.l 27 
consequence ANK_cds .2 31 
GB:AC026437_2 29 37205 37205 A>Q 

source isSNP SNP00022143 

consequence ANK_cds.l 27 
consequence ANK_cds.2 31 
GB:ACP26437_2 29 37340 37340 A>T 

source dbSNP gnl | dbSNPj ss469809_allele 

consequence ANIv_cds.l 27 Intron 
consequence ANK_cdB.2 31 intron 
GB:AC026437_2 29 52817 52817 G>T 



Intron . 
Intron 



source wetSNP 
consequence ANK_cds . 1 
consecfuence AlvE^cds . 2 
GB:AC026437_2 29 
source wetSNP 
consequence aNK_cds . 1 
consequence AMK_cds . 2 
GB:AC026437_2 29 
source wetSNP 
consequence ANK_cds.l 
consequence AlSIK_cds . 2 
GB:AC026437_2 29 
source isSNP SNP00093702 

consequence ANK_cds . 1 . . .27 . Intron 
consequence ANK_cds.2. 31. Intron 
GB:AC026437_2 29 78010 78010 C>G 

source isSNP SNP00036339 

consequence ANK_cds . 1 
consequence ANK_cds . 2 
GB:AC026437_2 29 78875 78875 A>Q 

source isSNP SNP00095793 



GB:AC026437_2.v52817.C>A 
27 Intron 
31 Intron 
52899 52899 A>G 
GB:ACa26437_2.v52899 .A>G 
27 Silent 274-274 

31 Silent 321-321 

52962 52962 G>T 
GB:AC026437_2 .V52962 .T>G 
27 Intron 
31 Intron 
63950 63950 A>Q 



Missense 
Mlssense 



315-315 
362-362 



A>P 
A>P 



27 
31 



Intron 
Intron 
81235 81235 A>G 
GB:AC026437_2.v81235.T>C 
27 Intron 
31 Intron 
82852 82852 A>G 



Intron 
Intron 



consequence ANK_cds . 1 
consequence ANK_cds . 2 
GB:AC026437_2 29 
source wetSNP 
consequence ANFL.cds . 1 
consequence ANK_cds . 2 
GB:AC026437_2 29 
source isSNP SNP00120424 

consequence ANK_cds.l 27 

consequence ANK_cds.2 31 

GB:AC026437_2 29 83057 83057 A>G 

source isSNP SNP00120425 

consequence AWK_cds . 1 27 

consequence ANK_cds.2 31 

GB:AC026437_2 ,29 83506 83506 A>G 

source isSNP SNP00045819 

consequence ANK_cds .1 27 Missense 

consequence ANK_cds . 2 31 Missense 

GB:AC026437_2 29 83587 83587 A>G 

source wetSNP C|§|AC026437_2 .v83587 .Q>A 



Intron 
Intron 



423-423 
470-470 



S>F 
S>F 
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TABLE 1 (Cont.) 



conseguence 
consequence 
GB:AC026437. 
source 



consequence 

consequence 

GB:AC026437. 

source 

consequence 

consequence 

OB:AC026437. 

source 

consequence 

consequence 

GB:AC026437. 



consequence 
consequence 
GIF ANK-genorai c- f wd . gi 



ANK_cds.l 27 3' 

ANK_cds.2 31 3' 

2 29 83607 83607 A>G 

isSNP SNP0000B779 

wetSNP GB:AC026437_2.v83607.A>G 

ANK_cds.l 27 3' 

ANK_cds.2 31 3' 

.2 29 84086 84086 A>Q 

isSNP SNP00012596 

ANK^cds.l 27. 3' 

ANK^cds.2 31 3' 

.2 29 84156 84156 A>G 

isSNP SNP00045820 

ANK_cds.l 27 3' 

ANKL.cds.2 31 3' 

.2 29 84896 84896 Q>T. 

isSNP SNP00045822 

A]SIK_cds.l 27 3' 

ANK_cds.2 31 3' 
f 



BGLAP 

Full name : Bone Gla Protein 

Link : FL_104137_link_genomic 

Subsequence GB : AC007227_2_104137CDl 

Subsequence GB : AC007227_2 1 

Subsequence BGIiAP_mma_build. 1 



35521 34594 #32 
167932 #33 
35539 34461 #34 



exon 



exon 



exon 



#32 



inRNA BGLAP_inma_build.l 451 bp 4 

exon 35539 35458 
35200 35162 
34991 34922 
34720 34461 
GB:AC007227_2_104137CD1 300 bp .4 exons 

exon 35521 35458 
35200 35162 
34991 34922 
34720 34594 

GB:AC007227„2 33 34618 34618 OG 

source wetSNP GB: AC007227_2 .v34618 .G>C 

consequence GB: AC007227_2_104137CD1 32 Silent 
GB:AC007227_2 33 34977 134977 G>T 

source wetSNP GB:AC007227_2.v34977 .G>T 

consequence GB : AC007227_2_104137CD1 32 Missensi 
GB:AC007227_2 33 35228 35228 C>G 

source isSNP SNP00038471 

consequence GB: AC007227_2_104137CD1 32 Intron 
GIF BGLAP-genoitiic-rev.gif 



exon 
Allele 



Allele 



Allele 



BGN 

Full name : BON 
Link : BGN_link_cdna 
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TABLE 1 (Cont.) 



Subsecjuence GB:HUMHPGI 1 

CDS GBrHXMHPGI.l 1107 bp 

ORF 121 1227 
Allele GB:HIMHPGI 35 7 

source 



#35 
#36 



isSNP SNP00011488 

consequence GB:HUMHPGI.l 36 

GB:HUMHPGI 35 261 261 A>G 

source 



Allele GB:HUMHPGI 35 261 261 

isSNP SNP0001148.9 
consequence GB : HUMHPGI . 1 3 6 Silent 

Allele GB:HXJKHPai 35 660 660 A>G 

source isSNP SNP00011490 

consequence GBsHUMHPGI.l 36 Silent 

Allele QB:HUMHPGI 35 1355 1355 A>G 
source isSNP SNP00092805 

consequence QB : HUMHPGI . 1 36 3' 

GIF BGN-cdna-fwd.gif 
Link : BGH_link_genoinic 

Subsequence GB:U82695 1 7614S #37 

Subsequence GB:U82695_2540367CD1 18042 21854 #38 

Subsequence BQN_inrna_buiia.l 8415 22311 #39 

CDS GB:U82695_2540367C3D1' 1107 bp 7 exons #38 

exoii 18042 18279 
exon 18648 18760 
exon 19272 19485 
exon 19938 20048 
exon 20239 20332 
exon 20455 20594 

exon 21657 21854 ... 
niRNA BGN_inma_build . 1 

. exon 8415 8523 

exon 18031 18279 

exon 18648 18760 

exon 19272 19485 

exon 19938 20048 

exon 20239 20332 

exon 20456 20594 

exon 21657 22311 
Allele GB:U82695 37 

source 



37 8484 8484 G>T 

isSNP SNP000U488 



Allele 



consequence GB:U82695_2540367a31 



38 



GB:U82695 

source 

consequence 

GB:U82695 

source 

source 

consequence 

GB:U82695 

source 



18161 18161 A>G 
wetSNP GB:U82695.vl8161.A>G 
GB:U8269 5_2540367CD1 38 Silent 

37 18182 18182 A>G 

isSNP SNP00011489 

wetSNP GB:U82695.vl8182.0>A 
GB:U82695_2540367CD1 38 Silent 

37 18330 18330 A>G 

wetSNP GB:U82695.vl8330.G>A 



consequence GB:U82695_25403 67CD1 



38 



Intr 



GB:U82695 37 18354 18354 A>G 

source wetSNP GB:U82695 .vl8354 .G>A 

consequence GB;U82695_254Q367C!D1 38 Intron 

QB:U82695 37 19460 19460 A>G 
source 
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source wetSNP GB :U82595 . vl9460 .T>C 

consequence GB-.U82695_2540367CD1 38 Silent 



GB:U82695 

source 

consequence 

GB:U82695 

source 

consequence 

GB:U82695 

source 

consequence 

GB:U82695 



consequence 



37 21566 21566 G>T 

wetSNP GB:U82695 .V21566 .G>T 

GB:U8269B_2540367CD1 38 Intron 

37 21639 21639 A>G 

wetSNP GB:U82695.v21639 .OT 



GB:U82695_2540367CD1 
37 21982 21982 A>Q 

isSNP SNP00092805 
GB:U82695_2540367CD1 
37 22172 22172 G>T 
isSNP SNP00011491 
GB :U82 6 9 5_2 54 03 67caDl 



38 



Intron 



GIF BGN-genomic-fwd.gif 



BHLHB2 

Full name : basic helix-loop-helix domain containing 
Link : BHLHB2_linlc_cdna 

Subsequence GB : AB004066_1 1 

CDS GB:AB004066_1.1 1239 bp 
ORF 197 1435 

Allele GB:AB004066^1 40 19 6 

source isSNP SNP00062724 

consequence GB : AB004066_1 . 1 

Allele GB:AB004066_1 40 829 

source isSNP SNP00046376 

consequence GB: AB004066_1 . 1 

Allele GB:AB004066_1 40 2070 

source isSNP SNPO 0013 041 

consequence GB:AB004066_1 .1 

Allele GB:AB004066_1 40 2323 

source isSNP SNP00013042 

consequence GB: AB004066_1 . 1 

GIF BHLHB2-cdna-fwd.gif 



2922 
#41 



196 



41 
829 



2070 



41 

2323 



#40 



A>G 



5' 
A>G 



Silent 
A>G 



Full name : BMP2 
Link : BMP2_link_cdna 

Subsequence GB:HUMBMP2A 1 

CDS GB:HlMBMP2A.l 1191 bp 

ORF 324 1514 
Allele GB:HUMBMP2A 
source 
consequence 
Allele GB:HtJMBMP2A 
source 



#42 
#43 



Allele 



42 584 584 
isSNP SNP00015730 
GB:HIMBMP2A.l 
42 760 760 

isSNP SNP00015731 
consequenc e GB s HIMBMP2 A - 1 
GB:HUMBMP2A 42 984 984 

source isSNP SNP0001gp2 
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consequence GB:mMBMP2A. 1 43 Missense 221- 

Allele QB:HUMBMP2A 42 1484 1484 A>G 

source isSNP SNP00015733 

consequence QB : HUMBMP2 A . 1 43 . Silent 387- 

QIF BMP2-cdna-fwd.gif 
Link : FL_3220019_link_gehoTOic 

Subsequence GB:HS859D4 1 178870 #44 

Subsequence GB-.HS859D4_3220019CD1 176685 167723 

Subsequence BMP2_mma_build. 1 178252 167687 #46 



itiRNA BMP2_jnmecbuild.l 1547 bp 

exon 178252 177937 
exon 176692 176340 
exon 168564 167687 

CDS GB:HS859D4_3220019CD1 1188 bp 



3 exons 



exon 
exon 
Allele 



176685 

168564 

GB:HS859D4 

source 

consequence 



176340 
167723 

44 167750 167' 
isSNP SNP00015733 
GB:HS859D4_3220019CD1 



#46 



H>N 
Allele 



R>S 
Allele 



T>I 
Allele 



GB:HS859D4 
source 



44 168250 
isSNP SNP00015732 



consequence GB :HS859D4_3220019CD1 



GB:HS859D4 

source 

consequence 

GB:HS859D4 
source 
cons equence 

GB:HS859D4 
source 
source 
consequence 



Missense 



44 168341 168341 A>T 

wetSNP GB:HS859D4.vl68341.T>A 



Missense 



GB:HS859D4_3220019CD1 

44 168474 168474 A>G 

isSNP SNP00015731 

GB:HS859D4_3220019Ca31 45 Misse 

44 176425 176425 A>Q 

isSNP SNP00015.73D 

wetSNP GB:HS859D4.vl76425.T>C 

GB:HS859D4_3220019CaDl 45 Siler 



GIF BMP2-genoinic-rev.gif 



BMP4 

Full name : BMP4 
Link : BMP4_link_cdna 

Subsequence GB:HIIMBMP2B 1 1751 #47 

CDS GB:HXMBMP2B.l 1227 bp #48 

ORF 395 1621 
Allele GB:HUMBMP2B 47 308 308 A>G 

source isSNP SNP00074676 

consequence GB : HTJMBMP2B . 1 48 
Allele GB:inJMBMP2B 47 849 849 A>G 

source isSNP SNP00000573 

consequence GB:HUMBMP2B.l 48 
GIF BMP4-cdna-fwd.gif 
Link : BMP4_link_genoinic 

Subsequence aB:HSU43842 1 #49 



BNSDC3CID; <WO 030541 66A2_L 



wo 03/054166 



PCT/US02/41225 



TABLE 1 (Cont.) 



Subsequence 
Subsequence 
iriEUvTA 



GB:HSU43842_1613615CD1 
BMP4_mma_build . 1 3207 



BMP4_inma_bui Id . 
exon 3207 3468 
exon 6620 6744 
exon 7791 8167 
exon 9131 10117 
CDS GB:HSU43842_1613ei5CDl 
exon 7798 8167 
9131 9984 
OB:HSU43842 49 



1224 bp 



exon 
Allele 



7798 9984 #50 

10117 #51 

4' exons #51 



A>G 



49 6665 6665 

source isSNP SNP00074676 

consequence GB :HSU43842_1613615C5D1 50 5' 
Allele GB:HSU43842 49 7752 7752 A>g' 
source isSNP SNP00117542 

consequence OB :HSU43842_1613615CDl 50 5' 
Allele GB:HSU43842 49 9215 9215 A>G 

source isSNP SNP00000573 

source wetSNP GB:HSU43842 .v9215 .OT 

consequence GB:HSU43842_1613615CD1 50 Missens 

A>V 

GIF BMP4-genoinic-fwd.gif 



BMP6 

Full name : BMP6 
Link : BMP6_link_cdna 

Subsequence GB:HIMTGFBC 1 2923 

CDS GBtHUMTGFBC.l 1542 bp 
QRF 160 1701 

Allele GBiHUMTGFBC 52 1263 1263 

source isSNP SNP00069306 

consequence GB t HXMTGFBC . 1 

Allele GBrHUMTGFBC 52 2280 2280 

source isSNP SNP00021640 

consequence GB : HUMTGFBC , 1 

Allele GB:HUMTGFBC 52 ' 2436 2436 

source isSNP SNP00003240 

consequence Cffi: HUMTGFBC. 1 

Allele GBtHUMTGFBC 52 2574 2574 

source isSNP S1TP00021639 

consequence GB : HUMTGFBC . 1 

GIF BMPe-cdna-fwd.gif 



#52 
#53 



CAPN4 

Full name ; calpain, small polypeptide 

Link : FL_508926_link_genomic 

Subsequence GB : CH19F24590 1 41369 #54 

Subsequence GB : CH19F24590_3639962CD1 31006 39830 

Subsequence FL_3639962_mma_build. 1 30073 40241 #56 

Subsequence CAPN4_cds.l 31006 39833 #57 

niRNA FL_3639962_jnma_build.l 1309 bp 11 exons 



BNSPOCID: <WO 03054ieeA2J_> 



wo 03/054166 



PCTAjS02/4122S 



TABLE 1 (Cont.) 





30073 


30151 


exon 


30991 


31214 




32294 


32327 




32646 


32735 




32903 


32960 


^on 


33058 


33122 




35800 


35868 




35970 


36048 


exon 


36190 


36306 




39572 


39630 


exon 


39807 


40241 


CAPN4 


cds . 1 


717 bp 




31006 


31214 


exon 


32294 


32327 


exon 


32903 


32960 


exon 


33058 


33122 


exon 


35800 


35868 


exon 


35970 


36048 


exon 


36190 


36306 


exon 


39572 


39630 


exon 


39807 


39833 



CDS GB:CH19F24590_3639962CD1 804 1^ 10 exons #55 



exon 


31006 


31214 


exon 


32294 


32327 


exon 


32646 


32735 


exon 


32903 


32960 


exon 


33058 


33122 


exon 


35800 


35868 


exon 


35970 


36048 


exon 


36190 


36306 


exon 


39572 


39630 


exon 


39807 


39830 



CBFAl 

Full name : CBFAl 
Link : CBFAl_link_cdna 

Subsequence GB:HUMCBFA 1 1411 

CDS GB:HUMCBFA.2 1323 bp 

ORF 1 1323 
Allele GBiHUMCBFA 58 260 260 

source isSNP SNP00063798 

consequence GB : HUMCBFA .'2 
GIF CBFAl-cdna-fwd.gif 
Link : CBFAl_link_genomic 



#58 
#59 



87-87 G>E 



Subsequence 


GBrHSCBFAlSl 


1 


93 


#60 


Subsequence 


GB:HSCBFA1S2 


194 


669 


#61 


Subsequence 


GB:HSCBFA1S3 


770 


1034 


#62 


Subsequence 


GB:HSCBFA1S4 


1135 


1381 


#63 


Subsequence 


GB:HSCBFA1S5 


1482 


1759 


#64 


Subsequence 


GB:HSCBFA1S6 


1860 


2081 


#55 


Subsequence 


GB:HSCBFA1S7 


2182 


2301 


#66 


Subsequence 


GB:HSCBFA1S8 


2402 
157 


3033 


#67 



BNSDCX3ID; <WO ^030541 66A2J_> 



wo 03/054166 



PCT/US02/41225 



TABLE 1 (Cont.) 



Subsequence 
CaDS CBFAl_cds 

exon 28 

exon 

exon 

exon 

exon 



CBFAl_cds.l 28 2948 #68 
1565 bp 8 exons #68 

85 
625 
977 
1302 
1706 
2042 
2266 
2948 

GB:HSC3FA1S3 62 
source wetSNP 
consequence CBFAl_cds 
GB:HSCBFA1S8 67 
source wetSNP 

consequence CBFAl_cds.l 68 Sil^t 
GIF CSFAl-genomic-fwd.gif 



261 
821 
1198 
1533 
1881 
2201 
2470 



Allele 



Allele 



177 177 A>G 
GB:HSCBFA1S3 .vl77 .C>T 
68 Silent 183-183 
490 490 A>G 
GB r HSCBFAlSS . v49 0 . OT 

503-503 



CDS 6 

Full name : CDS 6 Glycoprotein 
Link : CD35_link_cdna 



Subsequence EM:HSCD3621 


1 2216 


#69 


Allele 


EM:HSCD3621 


69 


123 123 


G>T 




source 


isSNP 


SNP00011023 




Allele 


EM:HSCD3621 


69 


195 195 


A>G 




source 


isSNP 


SNP00096573 




Allele 


EM:HSCD3621 


69 


230 230 


C>G 




source 


isSNP 


SNP00110263 




Allele 


EM:HSCD3621 


69 


827 827 


A>G 




source 


isSNP 


SKP00115780 




Allele 


EM:HSCD3621 


69 


1332 1332 


A>G 




source 


isSNF 


SNP00096574 





Link : CD36_link_genoitiic 



Subsequence 


CD36_link^cds . 1 


2094 


6548 


#70 


Subsequence 


EM:HSCD36G1 101 


236 


#71 




Subsequence 


EM:HSCD36A 338 


2898 


#72 




Subsequence 


KM:HSCD3664 3000 


3220 


#73 




Subsequence 


EM:HSCD36G5 3322 


3529 


#74 




Subsequence 


EM:HSCD36AA 3631 


3999 


#75 




Subsequence 


EM:HSCD36G7 4101 


4252 


#76 




Subsequence 


EM:HSCD36G8 4354 


. 4460 


#77 




Subsequence 


EM:HSCD3669 4562 


4691 


#78 




Subsequence 


EM:HSCD36G10 


4793 


5042 


#79 


Subsequence 


EM:B74110 5144 


5803 


#80 




Subsequence 


EM:HSCD36G12 


5905 


6038 


#81 


Subsequence 


EM:HSCD36G13 


6140 


6252 


#82 


Subsequence 


EM:HSCD36G14 


6354 


6847 


#83 


Subsequence 


EM:HSCD36G15 


6949 


7632 


#84 


Subsequence 


CD3 6_inma_bui Id . 1 


136 


7602 


#85 


iriRNA CD36_inrna_build.l 2217 


bp 


16 exons 


exon 136 


206 








exon . 1446 


1539 








exon 2005 


2213 


158 







wo 03/054166 
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TABLE 1 (Cont.) 



exon 


3030 


3190 ■ 


exon 


3352 


3499 


exon 


3719 


3898 


exon 


4131 


4222 




4384 


4430 


exon 


4592 


4661 




4824 


5011 


exon 


5265 


5383 


exon 


5935 


6008 


exon 


6168 


6222 


exon 


6384 


6548 


exon 


6979 


7071 


exon 


7152 


7602 


CD36_ 


.link_cdB.l 


exon 


2094 


2213 


exon 


3030 


3190 


exon 


3352 


3499 


exon 


3719 


3898 


exon 


4131 


4222 


exon 


4384 


4430 


exon 


4592 


4661 


exon 


4824 


5011 


exon 


5265 


5383 


exon 


5935 


6008 


exon 


6168 


6222 


exon 


6384 


6548 



EM:HSCD36A 72 1160 1160 

source isSNP SNP00011023 

consequence CD36_link_cds . 1 
EM:HSCD36A 72 1698 ' 1698 

source isSNP SNP00096573 

consequence CD3 6_link_cds . 1 
72 1732 1732 

isSNP SNP00110263 
consequence CDS 6_link_cds . 1 
EM::HSCD36a4 73 102 102 



EM:HSCD36A 
source 



70 
OG 



wetSNP 



source 

consequence CDS 6_link_cds . 1 

EM:HSCD36AA 75 232 232 

source isSNP SNP00115780 

consequence CD36_link_cds . 1 
EM:HSCD36Q10 79 92 

source 



EM : HSCD3 6G4 . vl02 . G>C 



Silent 
A>G 



wetSNP 



EM:HSCD36G10 .v92 .T>C 



A>G 



203 



Silent 



Silent 
AAGTAT>AT 



conseq[uence CD3 6_link_cds . 1 
EM:B74110 80 193 193 

source isSNP SNP00096574 

consequence CD3 6_link_cds . 1 
EM:HSCD36G14 83 198 

source wetSNP EM:HSCD36G14 . vl98 . AAGTAT>AT 

consequence CD3 6_link_cds . 1 70 3 ' 

EM:HSCD36G14 83 421 421 A>G 

source isSNP SNP00041723 

consequence caD36_link_cds . 1 70 3' 
-genomi c- f wd . gi f 

159 



BNSDOCID: <WO_ 
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TABLE 1 (Cont.) 



Allele 



Allele 



Allele 



1558 
#87 



834 



#86 



CD68 

Full name : CD68 antigen 
Link : FL_3777141_link_cc3na 

Subsequence FN: 3777141CB1 1 

CDS FN:3777141CB1.1 1065 bp 
ORF 75 1139 ■ 

FN:3777141C!B1 86 834 

source isSNP SNP00006442 

consequence FN: 3777141CB1 . 1 87 Missense 254-254 
FN:3777141CB1 86 1394 1394 Q>T 

source dbSNP gnl | dbSNP | ss450666_allele 

consequence FN: 3777141CB1 . 1 87 3' 
FN:3777141CB1 86 1475 1475 G>T 

source isSNP SNP00108664 

consequence FN: 3777141CB1 . 1 87 . 3' 
GIF CD68-cdna-fwd.gif 
Link : FL_1803929_link_genoinic 

Subsequence GB: AC007421_12 1 95240 #88 

Subsequence GB: AC007421_12_3777141CDl 92493, 90660 #89 

Subsequence FL_3777141_inma_build. 1 92567 90242 #90 

TdBNh FL_3777141_inmk_build.l 1557 bp 6 exons #90 

92567 92445 
92361 91844 
91705 91586 
91460 91388 
91275 91105 
90793 90242 

CDS GB:AC007421_12_3777141CD1 1065 bp 6 exons #89 

exon 92493 92445 
92361 91844 
91705 91586 
91460 91388 
91275 91105 
90793 90660 

GB:AC007421_12 88 90404 90404 Q>T 

source dbSNP gnl | dbSNP | ss450666_allele 

consequence GB : AC007421_12_3777141CD1 89 3' 

GB:AC007421_12 88 90707 90707 A>a 

source wetSNP GB : AC007421_12 . v90707 . OT 

consequence GB: AC007421_12_3777141CD1 89 Missense 

A>T 

GB:AC007421_12 88 91388 91388 G>T 

source wetSNP GB: AC007421_12 .v91388 .G>T 

consequence GB : AC007421_12_3777141CD1 89 Missense 

Q>K 

GB:AC007421_12 88 92357 92357 A>G 

source wetSNP GB:AC007421_12 .vS/,357 .OT 

consequence GB: AC007421_12_3777141CD1 89 Silent 

18-18 Q 

GIF CD68-genoinic-rev.gif 



exon 
exon 
exon 
exon 
exon 
exon 



exon 
exon 
exon 
exon 
exon 
Allele 



Allele 



340-340 
Allele 



254-254 
Allele 



_0305416eA2J_> 



wo 03/054166 



PCT/US02/41225 



TABLE 1 (Cont.) 



Full name : cysteine dioxygenase type I 
Link ! CD01_link_cdna 

Subsequence GBiHHSCYSDIO 1 

CDS GB:HHSCYSDI0.1 603 bp 

ORF 255 857 
Allele GB:HHSCYSDIO 91 100 

source isSNP SNP00009024 

consequence GBiHHSCYSDIO. 1 
Allele GBiHHSCYSDIO 91 737 

- source isSNP SNP00048574 

consequence GBiHHSCYSDIO.l 
Allele GBiHHSCYSDIO 91 784 

source isSNP SNP00036859 

consec[uence GBiHHSCYSDIO.l 
Allele GBiHHSCYSDIO 91 1082 

source isSNP SNP00107326 

consequence GBiHHSCYSDIO.l 
Allele GBiHHSCYSDIO 91 1525 

source isSNP SNP00036860 

consequence GBiHHSCYSDIO.l 
GIF CD01-cdna-fwd.gif 
Link 1 CD01_link_genomic 



1556 
#92 



92 
737 



92 
784 



92 

1082 



Silent 
A>G 



Hissense 
A>G 



1525 A>G 



Subsequence 


CD01_cds.l 


1653 


4275 


#93 


Subsequence 


GB:D85778_1 


1 


2601 


#94 


Subsequence 


GBiD85779_l 


2702 


2938 


#95 


Subsequence 


GB:D85780_1 


3039 


3525 


#96 


Subsequence 


GB:D85781_1 


3626 


4090 


#97 


Subsequence 


GB:D85782_1 


4191 


4921 


#98 


Subs equence 


CDOl_mma_build . 1 


1402 


4921 # 


BnRNA CDOl_jnma_build . 1 


1500 


bp 


5 exons 



exon 


1402 


1822 


exon 


2789 


2866 


exon 


3178 


3332 


exon 


3777 


3946 


exon 


4246 


4921 


CD01_ 


cds.l 


603 bp 


exon 


1653 


1822 


exon 


2789 


2866 


exon 


3178 


3332 


exon 


3777 


3946 


exon 


4246 


4275 



GBiD85778_l 

source 

consequence 

GB:D85781_1 

source 

consequence 

GBiD85782_l 

source 

consecjuence 



94 1498 1498 
isSNP SNP00009024 
CDOl_cds.l 93 

97 278 278 
isSNP SNP00036859 
CDOl_cds.l 93 

98 310 310 
isSNP SNP00107326 
CD01_cds.l 93 



Missense 
A>G 



GIF CDOl-genoitiic-fwd.gif 



BNSDOCID: <W0 030S416eA2J_> 
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Link : CGI-52_link_cdna 

Subsequence GB:AF151810 1 1414 #100 

CDS GB:AF151810.1 1080 bp #101 

ORF 277 1356 
Allele GB:AF151810 100 1335 1335 A>G 

source isSNP SNPb0054191 

consequence GB: AF151810 . 1 101 S 

GIF CGI-52-cdna-fwd.gif 
Link : CGI-52_link_genomic 

Subsequence GB:AC023176_7 1 193672 

Stibsequence CGI-52_nun:ia_build. 1 131456 

iriRNA CX3I-52_inma_build.l 1420 bp 7 

exon 131456 131084 
exon 119505 119186 
exon 97592 97445 
exon 96844 96741 
exon 96095 95978 
exon 93964 93912 
exon 93353 93050 
Allele GBsAC023176_7 102 93129 93129 H 

source isSNP SNPQ0054191 

Allele GB:AC023176_7 102 93416 93416 P. 

source isSNP SNP00057212 

Allele GB:AC023176_7 102 131305 1 

source isSNP SNP00069496 

GIF CGI-52-genoitiic-rev.gif 



#102 
93050 



CHI3L1 

Full name : chitinase 3-like 1 
Link : CHI3Ll_link_cdna 

Subsequence GB:NM_001276_1 1 1925 

CDS QB:NM_001276_1.1 1152 bp #105 
ORF 127 1278 

Allele GBsNM_001276_l 104 559 559 

source isSNP SNP00008252 

consequence GB :NM_001276_1 . 1 105 

Allele QB:NM_001276_1 104 590 590 

source isSNP SNP00071935 

consequence GB:NM_001276_1. 1 105 

Allele GB:l)aM_001276_l 104 646 646 

source isSNP SNP00022932 

consequence GB:NM_001276_1,1 105 

Allele GB:NM_001276_1 104 1300 1300 

source isSNP SNP00052666 

consequence GB :NM_001276_1 . 1 105 

Allele GB:NM_001275_1 104 1342 1342 

source isSNP SNP00072805 

consequence GB:1SM_001276_1 . 1 105 

Allele GB:NM_001276_1 104 1739 1739 

source isSNP SNP00076686 

consequence GB:NM_001276_1.1 105 

GIF CHI3Ll-cdna-fwd.gif 
Link t CHI3Ll_link_genomic 



Missense 
A>G 



Missense 
G>T 



Missense 
A>G 



BNSDOCID: <WO ^030541 e6A2_l_> 
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Subsequence 


CHI3Ll_cds . 1 




1295 


727S 


#106 


Subsequence 


CHI3Ll_cds.2 




1295 


7433 


#107 


Subsequence 


CHI3Ll_cds . 3 




1295 


7276 


#108 


Subsequence 


CHI3Ll_cds . 4 




1295 


2802 


#109 


Subsequence 


GB:Y08374_1 


1 


1635 


#110 




Subsequence 


GB:Y08375_1 


1736 


3186 


#111 




Subsequence 


GB:Y08376_1 


3287 


4116 


#112 




Subsequence 


GB:Y08377_1 


4217 


5035 


#113 




Subs equ en c e 


QB:Y0837B_1 


5136 


7923 


#114 




Subsequence 


CHI3Ll_inrna_build . 


,1 


1169 


7923 


Subsequence 


CHI3Ll_mma_build . 


.2 


1169 


7604 





caHI3Ll_nima_bui Id . 2 


exon 


1169 


1319 




exon 


1572 


1601 




exon 


2036 


2237 




exon 


2789 


2845 




exon 


3606 


3756 




exon 


4517 


4638 




exon 


5436 


5559 




exon 


6069 


6251 




exon 


6844 


6960 




exon 


7296 


7456 




exon 


7548 


7604 




CHI3L1 


_cds . 


1 


1152 bp 


exon 


1295 


1319 




exon 


1572 


1601 




exon 


2036 


2237 




exon 


2789 


2845 




exon 


3606 


3756 




exon 


4517 


4638 




exon 


5436 


5559 




exon 


6069 


6251 




exon 


6844 


6960 




exon 


7136 


7276 




CHI3L1 


._cds . 


2 


1149 bp 


exon 


1295 


1319 




exon 


1572 


1601 




exon 


2036 


2237 




exon 


2789 


2845 




exon 


3606 


3756 




exon 


4517 


4638 




exon 


5436 


5559 




exon 


6069 


6251 




exon 


6844 


6960 




exon 


7296 


7433 




caiI3Ll_cds. 


,3 


969 bp 


exon 


1295 


1319 




exon 


1572 


1601 




exon 


2036 


2237 




exon 


2789 


2845 




exon 


3606 


3756 




exon 


4517 


4638 




exon 


5436 


5559 




exon 


6844 


6960 




exon 


7136 


7276 





#115 
#116 

11 exoris #116 



BNSDOCID: <WO ^030541 66A2J_> 
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TABLE 1 (Cont.) 

CHI3Ll_inma_build.l 1925 bp 



exon 


1169 1319 








exon 


1572 1601 








exon 


2036 2237 








exon 


2789 2845 








exon 


3606 3756 








exon 


4517 4638 








exon 


5436 -5559 








exon 


6069 6251 








exon 


6844 6960 








exon 


7136 7923 








CDS CHI3Ll_cds.4 


69 bp 3 exons 


#109 






1295 1319 








exon 


1572 1601 










2789 2802 








Allele 


GB:Y08376_1 
source 


112 311 311 
isSNP SNP00071934 


G>T 






consecjuence 


CHI3Ll_cds . 1 


106 


Intron 




consequence 


CHI3Ll_cds.2 


107 


Intron 




consequence 


CHI3Ll_cds.3 


108 


Intron 






CHI31,l_cds.4 


109 


3 ' 


Allele 
e e 


GB: Y08376_l 


112 438 438 
isSNP SNP00008252 


A>G 






consequence 


CHI3Ll_cds.l 


106 


Missense 




consequence 


CHI3Ll_cds.2 


107 


Mlssense 




consequence 


CHI3Ll_cds . 3 


108 


Missense 




consequence 


CHI3Ll_cds.4 


109 


3' 


Allele 


GB:Y08377^1 
source 


113 355 355 
isSNP SNPOO 022932 


G>T 






consequence 


CHI3Ll_cds . 1 


106 


Missense 




consequence 


CHI3Ll_cds . 2 


107 


Missense 




consequence 


CHI3Ll_cds.3 


108 


Missense 






CHI3Ll_cds ; 4 


109 


3' 


Allele 


GB: Y08378_l 
source 


114 506 506 
isSNP SNPP0005491 


A>Q 






consequence 


Cail3Ll_cds.l 


106 


Intron 




consequence 


C5II3Ll_cds . 2 


107 


Intron 




consequence 


CHI3Ll_cds.3 


108 


Intron 




consequence 


CHI3Ll_cds,4 


109 


3' 


Allele 


GB:Y08378_1 
source 


il4 535 535 
isSNP SNP00005492 


A>G 






consequence 


CHl3Ll_cds . 1 


106 


Intron 




consequence 


CHI31.1_cds.2 


107 


Intron 




consequence 


CHI3Ll_cds.3 


108 


Intron 




. consequence 


CHI3Ll_cds.4 


109 


3 '. 


Allele 


GB:Y08378_1 
source 


114 641 641 


A>G 






consequence 


CHI3Ll_cds . 1 


106 


Intron 




consequence 


CHI3Ll_cds.2 


107 


Intron 




consequence 


CHI3Ll_cds.3 


108 


Intron 




consecjuence 


GHI3Ll_cds.4 


109 


3' 


Allele 


GB:Y08378_1 
source . 


114 1560 1560 
isSNP SNP00028112 


A>G 






consequence 


CHI3Ll_cds.l 


106 


Intron 




consequence 


CHI3L1 cds.2 

164 


107 


Intron 



145-145 
145-145 
145-145 



174-174 
174-174 
174-174 



R>G 
R>Q 
R>G 



L>I 
I.>I 
L>I 
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consequence 
consequence 

Allele GB:Y08378_1 
■ source 
consequence 
consequence 
consequence 
consequence 

Allele GB:Y08378_1 
source 
consequence 
consequence 
consequence 
consequence 

Allele QB:Y08378_1 
source 
consequence 
consequence 
consec[uence 
consequence 

GIF CHI3Ll-genomic-fwd 



CHl3Ll_cds.3 
CHI3Ll_cds . 4 
114 2163 2163 
isSNP SNP00052666 
CHI3Ll_cds.l 
CHI3Ll_cds . 2 
CHI3Ll_cds.3 
CHI3Ll_cds.4 
114 2205 2205 
isSNP SNP0007280.5 
caiI3Ll_cds . 1 
CHI3Ll_cds . 2 
CHI3Ll_cds . 3 
CHI3Ll_cds . 4 
114 2602 2602 
isSNP SNP00076686 
cail3Ll_cds.l 
CaiI3Ll_cds . 2 
CHI3Ll_cds . 3 
CHI3Ll_cds.4 
.gif 



108 
109 
A>G 

106 
107 
108 
109 
A>G 

106 
107 
108 
109 
A>G 

106 
107 
108 
109 



CHI3L2 

Full name : chitinase 3-like 2 
Link : CHI3L2_link_cdna 

Subsequence GB:HSU58514 1 

CDS GB:HSU58514.1 1173 bp 

ORF 37 1209 
Allele GB:HSU58514 
source 



#117 
#118 



Allele 



consequence 
GB:HSU58514 



117 412 412 
isSNP SNP00021152 
GB:HSU5B514.1 
117 581 581 
isSNP SNP00021153 
QB:HSU58514,1 
117 972 972 
isSNP SNP00115597 
GB:HSU58514.1 
117 1204 1204 
xsSNP SNP00068229 



consequence 
Allele GB:HSU58514 

source 

cons equenc e 
Allele QB:HSU58514 

source 

consequence GB :HSU58514 . 1 

GIF CHI3L2-Gdna-fwd.gif 
Link : CHI3L2_alt_link_cdna 

Subsequence GB:y58515_l 1 1500 

CDS GB:U58515_1.1 1275 bp 

ORF 1 1275 

Allele GB:U58515_1 119 478 478 

source isSNP SNP00021152 

consequence GB :U58515_1 . 1 

Allele GB:U58515_1 119 647 647 

source isSNP SNP00021153 

consequence GB :U58515_1 . 1 

Allele GB:U58515_1 119 1038 1038 

source isSNP SNP0011^^97 



118 Missense 126-126 
A>G 



118 Missense 182-182 
A>G 



118 Silent 
A>G 



118 Silent 



#119 
#120 



120 Missense 160-160 
A>G 



120 Missense 216-216 
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consequence GB:U58515_1 . 1 120 

Allele GB:U58515_1 119 1270 1270 A>G 
source isSNP SWP00068229 

consequence GB:U58515J.i 120 
GIF CHI3L2-cdna-fwd.gif 



CILP 

Full name : cartilage intermediate layer protein 
Link : CILP_link_cdna 

Subsequence GB:AF035408 1 4175 



CDS GB:AF035408 
ORF 130 



3555 bp 



#121 
#122 



3684 
GB:AF035408 



Allele 



consequence 
GIF CILP-cdna-fwd.gif 
Link : CILP_link_genomic 
Subsequence 
Subsequence 
Subsequence 
CDS CILP_cds.l 
exon 3606 



121 430 430 
isSNP SNP00123071 
aB:AF035408.1 
QB:AF035408 121 1677 1677 
source isSNP SNP00123072 

GB:AF03 5408.1 
121 3066 3066 
isSNP SNP00020276 
GB:AF035408.1 
121 3263 3263 
isSNP SNP00123073 
GB:AF035408.1 
121 3625 3625 
isSNP SNP00055164 
GB:AF035408.1 



consequence 



consequence 

GB:AF035408 

source 

consequence 

GB:AF035408 

source 

consequence 

GB:AF035408 



122 
A>G 



122 
A>Q 



122 
A>Q 



122 
A>G 



exon 
exon 



exon 
exon 



exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
Allele 



CILP_cds.l 3606 16639 #123 
GB:AB022430_1 1 19486 #124 

CILP_inma_build.l 1911 17130 #125 
3 555 bp 8 exons #123 

3666 
5691 
6581 
8076 
9095 
10001 
11493 
16639 

CILP_jnma_build.l 4175 bp 9 exons 

1911 1933 
3666 
5691 

6581 
8076 
9G95 
10001 
11493 



5599 

6312 

7897 

8781 

9893. 

11336 

14271 



3500 
5599 

6312 
7897 
8781 
9893 
11336 



14271 17130 

aBsAB022430_: 

source 



124 



3567 3567 a>T 
QB:AB022430_l.v3567.A>C 
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consequence CILP_cds . 1 123 5 ' 
GB:AB022430_1 124 6458 6458 

source isSNP SNPO 0123 071 



A>G 



101-101 



consequence CILP_cds . 1 
GB:AB022430_1 124 
source wetSNP 
consequence CILP_cds . 1 
GB:AB022430_1 124 
source wetSNP 
consequence CILP_cds . 1 
GB:AB022430^1 124 
source wetSNP 
consequence CIIjP_cds . 1 
GB:AB022430_1 124 
source wetSNP 
consequence CILP_cds . 1 
GB:AB022430_1 124 
source wetSNP 
consequence CILP_cds . 1 
GB:AB022430_1 124 
source wetSNP 
consequence CILP_cds . 1 
GB:AB022430_1 124 
source isSNP SNP00123072 

consequence CILP_cds.l 123 
GB:AB022430_1 124 
source wetSNP 
consequence CILP_cds . 1 
QB:AB022430_1 124 
source wetSNP 
consequence CILP_cds . 1 
GB:AB022430_1 124 
source isSNP SNPG0020276 

consequence CILP_cds.l 123 Silent 
GB:AB022430_1 124 16218 16218 A>Q 

source isSNP SNP00123073 

consequence CILP_cds.l 123 Missense 1045-3 
GB:AB022430_1 124 16580 16580 A>G 

source isSNP SNP00055164 

source wetSNP GB: AB022430_1 .vl6580 . A>Q 



123 Missense 
9874 9874 A>G 
GB : AB022 43 0_1 . v9 87 4 . OT 
123 Intron 
9881 9881 A>G 
GB: AB022430_1 . v9881 . OT 
123 Intron 
11286 11286 A>T 
GB:AB022430_l.vll286.T>A 
123 Intron 
11491 11491 A>G 
GB : AB022430_1 . vll491 . C>T 
123 Missense 395-395 • 

14421 14421 C>G 
GB:AB.022430_l.vl4421.G>C 
123 Missense 446-446 
14542 14542 A>G 
GB:AB022430_l.vl4542.G>A 
123 Silent 486-486 
14632 14632 A>G 



Silent 516-516 
15116 15116 A>G 
GB:AB022430_l.vl5116 .G>A 
123 Missense 678-678 
15670 15670 A>G 
GB:AB022430_l.vl5670.G>A 
123 Silent 862-862 
16021 16021 A>a 



979-979 



consequence CILP_cds . 1 
'-genomic-fwd.gif 



123 



Missense 



1166-1166 



COLlOAl 

Full name : collagen, type X, alpha 1 
Link : GOL10Al_link_cdna 

Subsequence GB:X603B2_1 1 3226 

CDS GB:X60382_1.2 2043 bp 

ORF 16 2058 
Allele GB:X60382_1 126 95 95 

source isSNP SNP000344B8 

consequence GB:X60382_1 . 2 
Allele GB:X60382_1 126 2294 2294 



#126 
#127 



127 
G>T 



Missense 



27-27 T>M 
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source isSNP SNP00113056 

consequence GB:X60382_1 .2 127 3' 

GIF COL10Al-cdna-fwd.gif 



COL11A2 

Full name : collagen, type XI, alpha 
Link : FL_3421452_link_genomic 



Subsequence 
Subsequence 
Subsequence 
Subs equence 
Subsequence 
Subsequence 
Subsequence 
Subsequence 
Subsequence 
Subsequence 
Subsequence 
Subsequence 
Subs equ enc e 
Subsequence 
mRNA 



GB:AL031228_1 1 
COLllA2_cds.l 93988 
COLllA2_cds.2 93988 
COLllA2_cds .3 . 93988 

COLllA2_cds.4 93988 
COLllA2_cds.5 .93988 
COLllA2_cds . 6 93988 
COLllA2_cds.7 93988 
COLllA2_cds.8 93988 
COLllA2_jnma_build . i 
COLllA2_inma_build.2 • 
GB:AL031228_1.20 93762 
GB:AL031228_1.21 ' 93988 
COLl lA2_itirna_build . 3 
6423 bp 





GB:AL031228_ 


1.20 ( 


exon 


93762 94069 




exon 


96759 96908 




exon 


97040 97250 




exon 


97704 97866 




exon 


99410 99601 




exon 


100450 


100527 


exon 


101174 


101236 


exon 


101904 


102083 


exon 


105058 


105117 


exon 


105223 


105264 


exon 


105498 


105560 


exon 


105896 


105970 


exon 


106423 


106509 


exon 


106741 


106797 


exon 


106944 


106997 


exon 


107102 


107155 


exon 


107255 


107308 


exon 


107496 


107 549 


exon 


107740 


107793 




107876 


107920 


exon 


108043 


108096 


exon 


108522 


108566 


exon 


108763 


108816 


exon 


109003 


109047 


exon 


109183 


109236 


exon 


109463 


109507 


exon 


109742 


109795 


exon 


109925 


109969 


exon 


110159 


110212 


■ exon 


110547 


11065-4 


exon 


111648 


111701 



175737 
122550 
122550 
122550 
122550 
122550 
122550 
122550 
122550 
93988 122834 
93988 122834 
123536 #139 
122550 #140 
93769 1250D2 
66 exons #139 



#128 
#129 
#130 
#131 
#132 
#133 
#134 
#135 
#136 



#137 
#138 
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exon 
exon 
exon 

exon 
exon 
exon 
exon 
exon 
exon 



exon 
exon 



exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 

A 

exon 
exon 
exon 
exon 
exon 
exoii 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 



112010 
112173 
112302 
112483 
112673 
112827 
113115 
113591 
113850 
114125 
114408 
114654 
114904 
115061 
115311 
115618 
115849 
116128 
116344 
116738 
117220 
117469 
117656 
118376 
118695 
118911 
119105 
119401 
119662 
120022 
120244 
120412 
121264 
121755 
122410 



112063 
112217 
112355 
112527 
112726 
112880 
113168 
113598 
113939 
114178 
114515 
114761 
114957 
115114 
115418 
115671 
115902 
116181 
116397 
116845 
117273 
117522 
117709 
118429 
118802 
118964 
119158 
119508 
119715 
120057 
120297 
120679 
121376 
121961 
123536 



COLllA2_inma_build . 3 
93769 94341 
95759 96908 
97040 97250 
97704 97866 
99410 99601 
101174 



101904 
105058 
105223 
105498 
105896 
106423 
106741 
106944 
107102 
107255 
107496 
107740 
107876 



101236 
102083 
105117 
105264 
105560 
105970 
106509 
106797 
106997 
107155 
107308 
107549 
107793 
107920 
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exon 


108043 


108096 


exon 


108522 


108566 


exon 


108763 


108816 


exon 


109003 


109047 


exon 


109183 


109236 


exon 


109463 


109507 


exon 


109742 


109795 


exon 


109925 


109969 


exon 


110159 


1101212 


exon 


110547 


110654 


exon 


111648 


111701 


exon 


112010 


112063 


exon 


112173 


112217 


exon 


112302 • 


112355 


exon 


112483 


112577 


exon 


112673 


112726 


exon 


112827 


112880 


exon 


113115 


113168 


exon 


113591 


113698 


exon 


113850 


113939 


exon 


114125 


114178 


exon 


114408 


114515 


exon 


114654 


114761 


exon 


114904 


114957 


exon 


115061 


115114 


exon 


115311 


115418 


exon 


115618 


115671 


exon 


115849 


115902 


exon 


116128 


116196 


exon 


116344 


116397 


exon 


116738 


116845 


exon 


117220 


117273 


exon 


117469 


117522 


exon 


117656 


117709 


exon 


118376 


118429 


exon 


118695 


118802 


exon 


118911 


118964 


exon 


119105 


119158 


exon 


119401 


119508 


exon 


120022 


120057 


exon 


120244 


120297 




120412 


120679 


exon 


121264 


121376 


exon 


121755 


121961 


exon 


122183 


122332 




122410 


123530 


exon 


124988 


125002 



CDS C01.11A2_cds.6 5157 bp 65 exons #134 

- exon 93988 94069 

exon 96759 96908 

exon 97040 97250 

exon 97704 97866 

exon 99410 99601 

exon 100450 100527 

exon 101174 101236 
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exon 


101904 


102083 


exon 


105058 


105117 


exon 


105223 


105264 


exon 


105498 


105560 


exon 


105896 


105970 


exon 


106423 


106509 


exon 


106741 


106797 


exon 


106944 


106997 


exon 


107102 


107155 


exon 


107255 


107308 


exon 


10749 6 


107549 


exon 


107740 


107793 


exon 


107876 


107920 




108043 


108096 




108522 


108566 




108763 


108816 




109003 


109047 


exon 


109183 


109236 




109463 


109507 


exon 


109742 


109795 


exon 


109925 


109969 


exon 


110159 


110212 


exon 


110547 


110654 


exon 


111648 


111701 


exon 


112010 


112063 


exon 


112173 


112217 


exon 


112302 


112355 


exon 


112483 


112527 


exon 


112673 


112726 


exon 


112827 


112880 


exon 


113115 


113168 


exon 


113591 


113698 


exon 


113850 


113939 


exon 


114125 


114178 




114408 


114515 


exon 


114654 


114761 




114904 


114957 




115061 


115114 


exon 


115311 


115418 




115618 


115671 


exon 


115849 


115902 


exon 


116128 


116181 


exon 


116344 


116397 


exon 


116738 


116845 


exon 


117220 


117273 


exon 


117469 


117522 


exon 


117656 


117709 


exon 


118376 


118429 


exon 


118695 


118802 


exon 


118911 


118964 


exon 


119105 


119158 


exon 


119401 


119508 


exon 


120022 


120057 


exon 


120244 


120297 


exon 


120412 


.120679 
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exon 121264 121376 

exon 121755 121961 

exon 122410 122550 
CDS GB:AL031228_1.21 5211 bp 66 exons #140 

exon 93988 94069 

exon 96759 96908 

exon 97040 97250 

exon 97704 97866 

exon 99410 99601 





100450 


100527 


exon 


101174 


101236 


exon 


101904 


102083 




105058 


105117 




105223 


105264 




105498 


105560 


exon 


105896 


105970 


exon 


106423 


106509 


exon 


106741 


106797 


exon 


106944 


106997 




107102 


107155 




107255 


107308 


Son 
exon 


107496 


107549 


exon 


107740 


107793 


exon 


107876 


107920 


exon 


108043 


108096 


exon 


108522 


108566 


exon 


108763 


108816 


exon 


109003 


109047 




109183 


109236 


Son 


109463 


109507- 


exon 


109742 


109795 


exon 


109925 


109969 




110159 


110212 


exon 


110547 


110654 




111648 


111701 


exon 


112010 


112063 




112i73 


112217 


exon 


112302 


112355 




112483 


112527 


exon 


112673 


112726 


exon 


112827 


112880 


exon 


113115 


113168 


exon 


113591 


113698 


exon 


113850 


113939 


exon 


114125 


114178 


exon 


114408 


114515 


exon 


114654 


114761 


exon 


114904 


114957 


exon 


115061 


115114 


exon 


115311 


115418 


exon 


115618 


115671 


exon 


115849 


115902 


exon 


116128 . 


116181 


exon 


116344 


116397 


exon 


116738 


116845 
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CDS COLllA2_cds.7 



exon 117220 

axon 117469 

exon 117656 

exon 118376 

exon 118695 

exon 118911 

exon 119105 

exon 119401 

exon 119662 

exon 120022 

exon 120244 

exon 120412 

exon 121264 

exon 121755 

exon 122410 



117273 
117522 
117709 
118429 
118802 
118964 
119158 
119508 
119715 
120057 
120297 
120679 
121376 
121961 
122550 
5049 bp 



64 exons 



#135 



exon 
exon 


93988 94069 




exon 






exon 


y / / Uft y / ooo 




exon 


QQAin QQ^^m 

yy^xu yyoux 




exon 




J.0 J.23 6 


exon 


105058 




exon 




105264 


exon 


105498 


105560 


exon 




105970 


exon 




106509 


exon 


_ 


106797 


exon 


106944 


106997 


exon 


107102 








107308 


exon 


107496 


107549 


exoil 


107740 


107793 


exon 


107876 


107920 


exon 


108043 


108096 


exon 


108522 


108566 


exon 


108763 


108816 


exon 


109003 


109047 


exon 


109183 


109236 


exon 


109463 


109507 


exon 


109742 


109795 


exon 


109925 


109969 


exon 


110159 


110212 


exon 


110547 


110654 


exon 


111648 


111701 


exon 


112010 


112063 


exon 


112173 


112217 


exon 


112302 


112355 


exon 


112483 


112527 


exon 


112673 


112725 


exon 


112827 


112880 


exon 


113115 


113168 


exon 


113591 


113698 


exon 


113850 


113939 


exon 


114125 


114178 
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exon 


114408 


114515 


exon 


114654 


114761 


exon 


114904 


114957 


exon 


115061 


115114 


exon 


115311 


115418 


exon 


115618 


115671 


exon 


115849 


115902 


exon 


116128 


116181 


exon 


116344 


116397 


exon 


116738 


116845 


exon 


117220 


117273 


exon 


117469 


117522 


exon 


117656 


117709 


exon 


118376 


118429 


exon 


118595 


118802 


exon 


118911 


118964 


exon 


119105 


119158 


exon 


119401 


119508 


exon 


120022 


120057 


exon 


120244 


120297 


exon 


120412 


120679 


exon 


121264 


121376 


exon 


121755 


121961 


exon 


1221B3 


122332 


exon 


122410 


122550 



Ca3S COLllA2_cds . 8 4986 bp 63 exons #136 



exon 


93988 94069 




exon 


96759 96908 




exon 


97040 97.250 




exon 


97704 97866 




exon 


99410 99601 




exon 


105058 


105117 


exon 


105223 


105264 


exon 


105498 


105560 


exon 


105896 


105970 


exon 


106423 


106509 


exon 


106741 


106797 


exon 


106944 


106997 


exon 


107102 


107155 


exon 


107255 


107308 


exon 


107496 


107549 


exon 


107740 


107793 


exon 


107876 


107920 




108043 


108096 


exon 


108522 


108566 




108763 


108816 


exon 


109003 


109047 


exon 


109183 


109236 


exon 


109463 


109507 


exon 


109742 


109795 


exon 


109925 


109959 


exon 


110159 


110212 


exon 


110547 


110654 


exon 


111648 


111701 


exon 


112010 


112063 



BNSDOCID: <WO ^030541 66A2_I_> 
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TABLE 1 (Cont.) 





112173 


112217 




112302 


112355 




112483 


112527 


exon 


112673 


112726 


exon 


112827 


112880 




113115 


113168 




113591 


113698 


exon 


113850 


113939 


exon 


114125 


114178 




114408 


114515 


«con 


114654 


114761 


exon 


114904 


114957 




115061 


115114 


exon 


115311 


115418 




115618 


115671 




115849 


115902 




115128 


116181 




115344 


115397 




115738 


116845 


exon 


117220 


117273 


exon 


117469 


117522 




117656 


117709 


exon 


118376 


118429 


exon 


118695 


118802 


exon 


118911 


118964 


exon 


119105 


119158 


exon 


119401 


119508 


exon 


120022 


120057 


exon 


120244 


120297 


exon 


120412 


120679 


exon 


121264 


121376 


exon 


121755 


121961 


exon 


122183 


122332 


exon 


122410 




COLllA2_cds.l 


4890 bp 


exon 


93988 94069 




exon 
exon 


96759 96908 
97040 97250 




exon 


97704 97866 




exon 


99410 99601 




exon 


105058 


105117 


exon 


105223 


105264 


exon 


105498 


105560 


exon 


105895 


105970 


exon 


106423 


106509 


exon 


106741 


106797 




106944 


106997 


exon 


107102 


107155 


exon 


107255 


107308 


exon 


107496 


107549 


exon 


107740 


107793 


exon 


107876 


107920 


exon 


108043 


108096 


exon 


108522 


108566 


exon 


108763 


108816 



63 exoins „ #129 



BNSDOCID: <WO__^03054ie6A2J_> 
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TABLE 1 (Cont.) 



exon 


109003 


109047 


exon 


109183 


109236 


exon 


109463 


109507 


exon 


109742 


109795 


exon 


109925 


109969 


exon 


110159 


110212 


exon 


110547 


110654 


exon 


111648 


111701 


exon 


112010 


112063 


exon 


112173 


112217 


exon 


112302 


112355 


exon 


112483 


112527 


exon 


112673 


112726 


exon 


112827 


112880 


exon 


113115 


113168 


exon 


113591 


113698 


exon 


113850 


113939 


exon 


114125 


114178 


exon 


114408 


114515 


exon 


114654 


114761 


exon 


114904 


114957 


exon 


115061 


115114 


exon 


115311 


115418 


exon 


115518 


115671 


exon 


115849 


115902 


exon 


116128 


116181 


exon 


116344 


116397 


exon 


116738 


116845 


exon 


117220 


117273 


exon 


117469 


117522 


exon 


117656 


117709 


exon 


118376 


118429 


exon 


118695 


118802 


exon 


118911 


118964 


exon 


119105 


119158 


exon 


119401 


119508 


exon 


119662 


119715 


exon 


120022 


120057 


exon 


120244 


120297 


exon 


120412 


120679 


exon 


121264 


121376 


exon 


121755 


121961 


exon 


122410 


122550 



CDS COLllA2_cds . 2 4953 bp 64 exons #130 



exon 


93988 94069 




exon 


96759 96908 




exon 


97040 97250 




exon 


97704 97866 




exon 


99410 99601 




exon 


101174 


101236 




105058 


105117 


exon 


105223 


105254 


exon 


105498 


105560 


exon 


105896 


105970 


exon 


106423 


106509 
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TABLE 1 (Cont.) 



exon 


106741 


106797 


exon 


106944 


106997 


exon 


107102 


107155 


exon 


107255 


107308 


exon 


107496 


107549 


exon 


107740 


107793 


exon 


107876 


107920 


exon 


• 108043 


108096 


exon 


108522 


108566 


exon 


108763 


108816 


exon 


109003 


109047 


exon 


109183 


109236 


exon 


109463 


109507 


exon 


109742 


109795 


exon 


109925 


109969 


exon 


110159 


110212 


exon 


110547 


110654 


exon 


111648 


111701 


exon 


112010 


112063 


exon 


112173 


112217 


exon 


112302 


112355 


exon 


112483 


112527 


exon 


112673 


112726 


exon 


112827 


112880 


exon 


113115 


113168 


exon 


113591 


113698 


exon 


113850 


113939 


exon 


114125 


114178 


exon 


114408 


114515 


exon 


114654 


114761 


exon 


114904 


114957 


exon 


115061 


115114 


exon 


115311 


115418 


exon 


115618 


115671 


exon 


115849 


115902 


exon 


116128 


116181 


exon 


116344 


116397 


exon 


116738 


116845 


exon 


117220 


117273 


exon 


117469 


117522 


exon 


117656 


. 117709 


exon 


118376 


118429 


exon 


118695 


118802 


exon 


118911 


118964 


exon 


119105 


119158 


exon 


119401 


119508 


exon 


119662 


119715 


exon 


120022 


120057 


exon 


120244 


120297 


exon 


120412 


120679 


exon 


121254 


121376 


exon 


121755 


121961 


exon 


122410 


122550 



C3DS COLllA2_cds.3 5307 bp 66 exons #131 

exon 93988 94069 

177 



wo 03/054166 



axon 96759 96908 

exon 97040 97250 

exon 97704 97856 

exon 99410 99601 

exon 100450 

exon 101174 

.exon 101904 

exon 105058 

exon 105223 

exon 105498 

exon 105 89 p 

exon 106423 

exon 106741 

exon 106944 

exon 107102 

exon 107255 

exon 107496 

exon 107740 

exon 107876 

exon 108043 

exon 108522 

exon 108763 

exon 109003 

exon 109183 

exon 109463 

exon 109742 

exon 109925 

exon 110159 

exon 110547 

exon 111648 

exon 112010 

exon 112173 

exon 112302 

exon 112483 

exon 112673 

exon 112827 

exon 113115 

exon 113591 

exon 113850 

exon 114125 

exon 114408 

exon 114654 

exon 114904 

exon 115061 

exon 115311 

exon 115618 

exon 115849 

exon 116128 

exon 116344 

exon 116738 

exon 117220 

exon 117469 

exon 117656 

exon 118376 

exon 118695 



PCT/US02/41225 



TABLE 1 (Cont.) 



100527 
101236 
102083 
105117 
105264 
105560 
105970 
106509 
106797 
106997 
107155 
107308 
107549 
107793 
107920 
108096 
108566 
108816 
109047 
109236 
109507 
109795 
109969 
110212 
110654 
111701 
112063 
112217 
112355 
112527 
112726 
112880 
113168 
113698 
113939 
114178 
114515 
114761 
114957 
115114 
115418 
115671 
115902 
116181 
116397 
116845 
117273 
117522 
117709 
118429 
118802 



wo 03/054166 
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exon 118911 118964 

exon 119105 119158 

exon 119401 119508 

exon 120022 120057 

exon 120244 120297 

exon .120412 120679 
exon 121264 121376 
exon 121755 121961 
exon 122183 122332 
exon 122410 122550 
rriRNA COLllA2_jnma_build.l 5174 bp 63 exons #137 



exon 


93988 94069 




exon 


96759 96908 




exon 


97040 97250 




exon 
exon 


97704 97856 
99410 99601 




exon 


105058 


105117 


exon 


105223 


105264 


exon 


105498 


105560 


exon 




105970 


exon 


106423 


106509 


exon 


106741 


106797 


exon 






exon 




107155 


exon 


107255 


107308 


exon 


107496 


107549 


exon 




107793 


exon 


107876 


107920 


exon 


108043 


108096 


exon 




108566 


exon 


108763 


108816 


exon 


109003 




exon 


109183 


109236 


exon 




109507 


exon 


109742 


109795 




109925 


109969 


exon 


110159 


110212 


exon 


110547 


110654 


exon 


111648 


111701 


exon 


112010 


112063 


exon 


112173 


.112217 


exon 


112302 


112355 


exon 


112483 


112527 


exon 


112673 


112726 


exon 


112827 


112880 


exon 


113115 


113168 




113591 


113698 


exon 


113850 


113939 


exon 


114125 


114178 


exon 


114408 


114515 


exon 


114654 


114761 


exon 


114904 


114957 


exon' 


115061 


115114 


exon 


115311 


115418 


exon 


115618 


115671 
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TABLE 1 (Cont.) 



CDS COLllA2_cds.4 



exon 115849 

exon 11612 8 

exon 116344 

exon 116738 

exon 117220 

exon 117469 

exon 117656 

exon 118376 

exon 118695 

exon 118911 

exon 119105 

exon 119401 

exon 119662 

exon 120022 

exon 120244 

exon 120412 

exon 121264 

exon 121755 

exon 122410 



115902 
116181 
116397 
116845 
117273 
117522 
117709 
118429 
118802 
118964 
119158 
119508 
119715 
120057 
120297 
120679 
121376 
121961 
122834 
4836 bp 



62 exons 



#132 



exon 


93988 94069 






96759 96908 




exon 


97040 97250 




exon 


97704 97866 




exon 


99410 99601 




exon 


105058 


105117 


exon 


105223 


105264 


exon 


105498 


105560 


exon 


105896 


105970 


exon 


106423 


106509 


exon 


106741 


106797 


exon 


106944 


106997 


exon 


107102 


107155 


exon 


107255 


107308 


exon 


107496 


107549 


exon 


107740 


107793 


exon 


107876 


107920 


exon 


108043 


108096 


exon 


108522 


108566 


exon 


108763 


108816 


exon 


109003 


109047 


exon 


109183 


109236 


exon 


109463 


109507 


exon 


109742 


109795 


exon 


109925 • 


109969 




110159 


110212 


exon 


110547 


110654 


exon 


111648 


111701 


exon 


112010 


112063 



exon 



exon 



exon 



exon 



exon 



exon 



112173 
112302 
112483 
112673 
112827 
113115 



112217 
112355 
112527 
112726 
112880 
113168 



180 
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TABLE 1 (Cont.) 



exon 


113591 


113698 


exon 


113850 


113939 


exon 


114125 


114178 


exon 


il4408 


114515 


exon 


114654 


114761 


exon 


114904 


114957 


exon 


115061 


115114 


exon 


115311 


115418 


exon 


115618 


115671 


exon 


115849 


115902 


exon 


116128 


116181 


exon 


116344 


116397 


exon 


116738 


116845 


exon 


117220 


117273 


exon 


117469 


. 117522 


exon 


117655 


117709 


exon 


118376 


11842? 


exon 


118695 


118802 


exon 


118911 


118964 


exon 


119105 


119158 


exon 


119401 


119508 


exon 


120022 


120057 


exon 


120244 


120297 


exon 


120412 


120679 


exon 


121264 


121376 


exon 


121755 


121961 


exon 


122410 


122550 



iriRNA COIillA2_mma_build.2 5237 bp 

exon 93988 94069 

exon 96759 96908 

exon 97040 97250 

exon 97704 97866 

exon 99410 99601 

exon 101174 

exon 105058 

exon 105223 

exon 105498 

exon 105896 

exon 106423 

' exon 106741 

exon 106944 

exon 107102 

exon 107255 

exon 107496 

exon 107740 

exon 10787 6 

exon 108043 

exon 108522 

exon 108763 

exon 109003 

exon 109183 

exon 109453 

exon 109742 

exon 109925 

exon 110159 



101236 
105117 
105264 
105560 
105970 
106509 
106797 
106997 
107155 
107308 
107549 
107793 
107920 
108096 
108566 
108816 
109047 
109236 
109507 
109795 
109969 
110212 
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6|4 exons #138 
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TABLE 1 (Cont.) 



110547 
111648 
112010 
112173 . 
112302 
112483 
112673 
112827 
113115 
113591 
113850 
114125 
114408 
114654 
114904 
115061 
115311 
115618 
115849 
116128 
116344 
116738 
117220 
117469 
117656 
118376 
118695 
118911 
119105 
119401 
119662 
120022 
120244 . 
120412 
121264 
121755 
- 122410 
CDS COLllA2_cds . 5 



exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 



exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 



110654 

111701 

112063 

112217 

112355 

112527 

112726 

112880 

113168 

113698 . 

113939 

114178 

114515 

114761 

114957 

115114 

115418 

115671 

115902 

116181 

116397 

116845 

117273 

117522 

117709 

118429 

118802 

118964 

119158 

119508 

119715 

120057 

120297 

120679 . 

121376 

121961 

122834 

4899 bp 



exon 


93988 94069 




exon 


96759 96908 




exon 


97040 97250 




exon 


97704 97866 




exon 


99410 99601 




exon 


101174 


101236 


exon 


105058 


105117 




105223 


105264 


exon 


105498 


105560 


exon 


105896 


105970 


exon 


106423 


106509 


exon 


106741 


106797 


exon 


106944 


106997 


exon 


107102 


,107155 


exon 


107255 


107308 


exon 


107496 


107549 


exon 


107740 


107793 



63 exons #133 
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TABLE 1 (Cont.) 



exon 
exon 
exon 
exon 
exon 
exon 



exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon. 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
Allele 



107875 
108043 
108522 
108763 
109003 
109183 
109463 
109742 
109925 
110159 
110547 
111648 
112010 
112173 
112302 
112433 
112673 
112827 
113115 
113591 
113850 
114125 
114408 
114654 
114904 
115061 
115311 
115618 
115849 
116128 
116344 
116738 
117220 
117469 

117656 

118376 

118695 

118911 

119105 

119401 

120022 

120244 

120412 

121264 

121755 

122410 

GB:AL031228. 



107920 

108096 

108566 

108816 

109047 

109236 

109507 

109795 

109969 

110212 

110654 

111701 

112063 

112217 

112355 

112527 

112726 

112880 

113168 

113698 

113939 

114178 

114515 

114761 

114957 

115114 

115418 

115671 

115902 

116181 

116397 

116845 

117273 

117522 

117709 

118429 

118802 

118964 

119158 

119508 

120057 

120297 

120679 

121376 

121961 

122550 



source 


isSNP SNP00027609 






consequence 


COLllA2_ 


_cds . 


.6 


134 


3 


consequence 


GB:AI.031228_ 


.1.21 


140 


3' 


consequence 


COLllA2_ 


_cds , 


.7 


135 


3 


consequence 


COLllA2_ 


_cds , 


.8 


136 


3 


consequence 


C0L11A2_ 


_Gds , 


.1 


129 


3 


consequence 


COL11A2. 


_cds, 


.2 


130 


3 


consequence 


COL11A2. 


_cds , 


.3 


131 


3 
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TABLE 1 (Cont.) 

consequence COLllA2_cds . 4 132 
consequence COLllA2_cds . 5 133 
GIF COIillA2-genomic-fwd.gif 



COL9A2 

Full name : collagen, type IX, alpha 2 
Link : FL_3482334^1ink_cdna 

Subsequence FN: 3482334CB1 1 

CDS FN:3482334CB1.1 2079 bp 

ORF . 99 2177 
Allele FN:3482334CB1 142 1087 

source isSNP SNP00032502 

consequence FN: 3482334CB1 . 1 
Allele FN:3482334CB1 142 1113 

source isSNP SNP00107342 

consequence FN: 3482334CB1 . 1 
Allele FN:3482334CB1 142 1301 

source isSNP SNPp0107343 

consequence FN: 3482334CB1. 1 
Allele FN:3482334CB1 142 1345 

source isSNP SNP00107344 

consequence FN:3482334CBl . 1 
Allele FN:3482334CB1 142 2211 

source isSNP SNP00067542 

consequence FN: 3482334CB1 . 1 
Allele FN:3482334CB1 142 2317 

source isSNP SNP00032503 

consequence FN:3482334C31 . 1 
GIF COIi9A2~cdna-fwd.gif 
Link : FL_165l412_linK_cdna 

Subsequence FN!l6514l2CBl 1 

CDS FN:1651412CB1.1 2067 bp 

ORF 68 2134 
Allele FN:1651412CB1 144 1044 

source isSNP SNP00032502 

consequence FN: 1651412CB1. 1 
FN:1651412CB1 144 1070 

source isSNP SNP00107342 

consequence FN: 1651412CB1 - 1 
FN:1651412CB1 144 1258 

source isSNP SNP00107343 

consequence FN: 1651412CB1 . 1 
Allele FN:1651412CB1 144 1302 

source isSNP SNP00107344 

consequence FN: 1651412CB1 . 1 
Allele FN:1651412CB1 144 2168 

source isSNP SNP00067542 

consequence FN: 1651412CB1 . 1 
Allele FN:1651412CB1 144 2274 

source isSNP SNPO 0032503 

consequence FN: 1651412CB1 . 1 
GIF COL9A2-cdna-fwd.gif 
Link ! FL_1651412_link_genomic 



2864 
#143 



1087 A>G 



143 
1113 



143- 
1301 



143 
1345 



143 
2211 



Mis sense 
OG 



Missense 
A>G 



Silent 
OG 



Missense 
A>G 



330-330 
339-339 
401-401 



Q>R 
L>V 



143 
2317 



2869 
#145 



1044 A>G 



Allele 



Allele 



145 
107 0 



145 
1258 



145 
1302 



Missense 
OG 



Missense 
A>G 



Silent 
OG 



145 
2168 



145 
2274 



Mis 
A>G 



326-326 
335-335 
397-397 
412-412 



R>Q 
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TABLE 1 (Cont.) 

Subsequence GB:AF019406 1 17606 #146 

Subsequence GB:AF019406_1651412CD1 1115 17091 #147 

Subsequence GB : AF019406_3482334CD1 1115 17 091 #148 

Subsequence FL_1651412_inrna_build. 1 1048 17606 #149 

Subsequence FL_3482334_mma_build . 1 1017 17606 #150 

iriRNA FL_1651412_mma_build.l 2649 bp 32 exons #149 



exon 


1048 


1189 


exon 


2635 


2709 


exon 


3905 


3940 


exon 


4025 


4087 


exon 


5507 


5560 


exon 


5682 


5717 


exon 


5811 


5834 


exon 


6178 


6231 


exon 


6573 


6626 


exon 


6741 


6788 


exon 


7002 


7058 


exon 


7142 


7195 


exon 


7521 


7574 


exon 


7971 


8024 


axon 


8124 


8177 


exon 


8297 


8350 


exon 


10041 


10094 


exon 


10530 


10583 


exon 


10787 


10840 


exon 


12101 


12145 




12519 


12572 


exon 


13436 


13489 




13754 


13807 


exon 


13892 


13963 




14184 


14219 




14311 


14355 


exon 


14440 


14472 


exon 


14603 


14749 


exon 


15093 


15147 


exon 


15467 


15655 


exon 


15387 


16464 


exon 


15895 


17606 



CDS GB:AF019406_3482334caDl 2079 bp 32 exons #148 



exon 


1115 


1189 


exon 


2635 


2709 


exon 


3905 


3940 


exon 


4025 


4087 


exon 


5507 


5560 


exon 


5682 


5717 


exon 


5811 


5834 


exon 


6178 


6231 


exon 


6573 


6626 


exon 


6741 


6800 


exon 


7002 


7058 


exon 


7142 


7195 


exon 


7521 


7574 


exon 


7971 


8024 


exon 


8124 


8177 


exon 


8297 


8350 
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exon 10041 10094 

exon 10530 10583 

exon 10787 10840 

exon 12101 12145 

exon 12519 12572 

exon 13436 13489 

exon 13754 13807 

exon 13892 13963 

exon 14184 14219 

exon 14311 14355 

exon 14440 14472 

exon 14603 14749 

exon 15093 15147 

exon 15467 15655 

exon 16387 16464 

exon 16895 17091 

sfA FL_3482334_inma_build.l 2692 bp 32 axons #150 





1017 


1189 




2635 


2709 


exon. 


3905 


3940 


exon 


4025 


4087 


exon 


5507 


5560 


exon 


5682 


5717 


exon 


5811 


5834- 


exon 


6178 


6231 


exon 


6573 


6626 


exon 


6741 


6800 


exon 


7002 


7058 


exon 


7142 


7195 


exon 


7521 


7574 


exon 


7971 


8024 


exon 


8124 


8177 


exon 


8297 


8350 




10041 


10094 


exon 


10530 


10583 


exon 


10787 


10840 


exon 


12101 


12145 


exon 


12519 


12572 


exon 


13436 


13489 


exon 


13754 


13807 


exon 


13892 


13963 


exon 


14184 


14219 


exon 


14311 


14355 



exon 14440 14472 

exon 14603 14749 

exon 15093 15147 

exon . 15467 15655 

exon 16387 16464 

exon 16895 17606 
CDS GB:AF019406_1651412CD1 2067 bp 



exon 


1115 


1189 


exon 


2635 


2709 


exon 


3905 


3940 


exon 


4025 


4087 


exon 


5507 


5560 
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axon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
Allele 



5682 
5811 
6178 
6573 
6741 
7002 
7142 
7521 
7971 
8124 
8297 
10041 



5717 
5834 
6231 
6626 
6788 
7058 
7195 
7574 
8024 
8177 
8350 
10094 



10530 10583 
10787 10840 
12101 12145 
12519 12572 
13436 13489 
13754 13807 
13892 13963 
14184 14219 
14311 14355 
14440 14472 
14603 14749 
15093 15147 
15467 15655 
16387 16464 
16895 17091 
GB:AF019406 
source 
consequence 



146 10809 10809 A>G 
isSNP SNP00032502 
GB:AF019406_3482334CD1 



Q>R 
Allele 



GIF COL9A2 



consequence GB:AF019406_1651412CD1 147 

GB!AF019406 146 13783 13783 A>G 
source isSNP SNP00107343 

consequence GB:AF019406_3482334CD1 148 

consequence GB:AF019406_16514r2CDl 147 

GB:AF019406 146 17229 17229 A>G ' 
source isSNP SNPO 0032503 

consequence GB: AF019406_3482334CD1 148 
consequence AF019406_1651412CD1 147 
-genomic- fwd . gi f 



Silent 
Silent 



401-401 
397-397 



COMP 

Full name : cartilage oligomeric matrix protein 
Link : FL_1901242_link_cdna 

Subsequence FN: 1901242CB1 

CDS FN:1901242CB1.1 2274 bp 

ORF 23 2296 
Allele FN:1901242CB1 151 

source isSNP SNPO 0017 02 6 

187 



2447 
#152 



1200 1200 A>G 
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TABLE 1 (Cont.) 



consequence FN: 1901242CB1 . 1 152 Missense 
Allele FN:1901242CB1 151 1319 1319 OG 

source isSNP SNP00108392 

consequence FN: 1901242CB1 . 1 
Allele FN:1901242C!B1 151 1335 

. source isSNP SNP00017027 

consequence FN: 1901242CB1 . 1 
Allele FN:1901242CB1 151 1777 

source isSNP SNP00017029 

consequence FN: 1901242CB1 . 
GIF COMP-cdna-fwd.gif 
Link : FL_19 01242_link_genomic 

Subsequence GB:AC003107 1 46275 #153 

Subsequence GB: AC003107_1901242caDl 32077 23724 #154 

Subsequence FL_1901242_inma_build. 1 32099 23582 #155 

2274 bp 19 exons #154 



152 
1335 



152 
1777 



152 



Missense 
OG 



Missense 
A>G 



Silent . 



CDS GB:AC003107_1901242CD1 


exon 


32077 


31999 


exon 


31743 


31658 


exon . 


31421 


31370 


exon 


30922 


30750 


exon 


30105 


29968 


exon 


29721 


29647 




29558 


29400 


exon 


29322 


29218 




29127 


29020 




28458 


28299 


fficon 


27459 


27341 




27100 


27048 




26955 


26774 


exon 


26660 


26482 


exon 


26355 


26307 


exon 


25901 


25705 


exon 


25172 


25000 


exon 


24002 


23863 


exon 


23770 


23724 


niRNA 


FL_1 9 0 12 4 2_inma_ 


exon 


32099 


31999 




31743 


31658 


exon 


31421 


31370 


exon 


30922 


30750 


exon 


30105 


29968 


exon 


29721 


29647 


exon 


29558 


29400 


exon 


29322 


29218 


exon 


29127 


29020 


exon 


28458 


28299 


exon 


27459 


27341 


exon 


27100 


27048 


exon 


26955 


26774 


exon 


26660 


26482 


exon 


26355 


26307 


exon. 


25901 


25705 


exon 


25172 


25000 


exon 


24002 


2386,3 


exon 


23770 


23582 
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S>L 
Allele 



TABLE 1 (Cont.) 



GB:AC003107 153 25864 25864 A>G 

source isSNP SNP00017029 

consequence GB:AC003107_1901242CD1 

GB:AC003107 153 2741? 27417 A>G 

source isSNP SNPo6oi7026 

consequence GB: AC003107_1901242CD1 



GB:AC003107 153 32082 32082 A>G 
source isSNP SNP00017025 

consequence GB : AC003107_1901242CD1 
GIF COMP-genomic-rev.gif 



154 Missense 



154 



585-585 



393-393 



CRLFl 

Full name : cytokine receptor-like factor 1 
Link : CRLFl_link_cdna 

Subsequence GB : AF073515_1 1 

CDS GB:AF073515_1.1 1269 bp 

ORF 204 1472 
Allele GB:AF073515_1 156 9 84 

source isSNP SNP00015261 

consequence GB:AF073515_1.1 
GIF CRLFl-cdna-fwd.gif 



1804 
#157 



CRP 

Full name : C-reactive protein 
Link : CRP_link_ccana 

Subsequence . GB:X56214_1 



CDS QB:X56214_1.1 

ORF 90 764 

Allele QB;X56214_1 
source 
consequence 

Allele GB:X56214_3 
source 



675 bp 

158 447 447 
isSNP SNP00100892 
GB:X56214_1.1 
158 988 988 
isSNP SNP00029575 



#158 
#159 



consequence GB :X56214_1 . 1 



GB':X56214_1 

source 

consequence 

GB:X56214_1 

source 



158 1010 1010 
isSNP SNP00076237 
GB:X56214_1.1 
158 1146 'll46 
isSNP SNP00076238 



consequence GB :X56214_1 . 1 



GB:X56214_1 

source 

consequence 

GB:X56214_1 

source 

consequence 

GB:X56214_1 

source 



158 1175 1175 
isSNP SNP0D100893 
GB:X56214_1.1 
158 1406 1406 
isSNP SNP00100894 
aB:X56214_l.l 
158 1525 1525 
isSNP SNP00100895 
189 



159 
A>Q 



159 
A>a 



159 
C>G 



159 
G>T 



159 
A>G 



159 
A>G 



consequence GB : X56214_l . 1 
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GIF CRP-cdna-fwd.gif 
Link : CRP_link_genomic 

Subsequence GB:HUMCRPGA 1 2480 #160 



Allele 


GB : HUMCRPGA 


160 


865 865 


A>G 




source 


isSNP 


SNP00100892 




Allele 


GB : HUMCRPGA 


160 


1404 1404 


A>G 




source 


isSNP 


SNP00029575 




Allele 


GB : HUMCRPGA 


160 


1426 1426 


A>G 




source 


isSNP 


SNP00076237 




Allele 


GB: HUMCRPGA 


160 


1562 1562 


OG 




source 


isSNP 


S]SrP00076238 




Allele 


GB: HUMCRPGA 


-160 


1591 1591 


G>T 




source 


isSNP 


SNP00100893 




Allele 


GB: HUMCRPGA 


160 


1822 1822 


A>G 




source 


isSNP 


SNP00100894 




Allele 


GB: HUMCRPGA 


160 


1941 1941 


A>G 




source 


isSNP 


SNP00100895 




Allele 


GB: HUMCRPGA 


160 


2045 2045 


A>a 




source 


isSNP 


S1SFP00100896 




Allele 


GB: HUMCRPGA 


160 


2159 2159 


A>G 




source 


isSNP 


SNP00100897 




Allele 


GB: HUMCRPGA 


160- 


2260 2260 


A>G 




source 


isSNP 


SNP00006286 





CRTLl 

Full name : cartilage linking protein 1 
Link : CRTLl_link_cdna 

Subsequence GB:HSU43328 1 1759 



CDS GB:HSU43328. 
ORF 118 



1182 

Allele GB:HSU43328 
source 
consequence 

Allele GB:HSU43328 
source 
consequence 

GIF CRTLl-cdna-fwd.gif 



1065 bp 

161 801 801 
isSNP SNP00020236 
GB:HSU43328.1 
161 1454 1454 
isSNP SNP00002295 
GB:HSU43328.1 



#161 
#162 



162 
A>G 



CTSC 

Full name : cathepsin C 
Link : CTSC_link_cdna. 

Subsequence QB:NM_001814 1 1838 #163 

CDS GB:NM_001814.1 1392 bp #164 

ORF 34 1425 
•Allele GBtNM_001814 163 491 491 A>G 

source isSNP SNP00006579 

consequence GB:NM_001814 . 1 164 Missense 153-153 T>I 

Allele GB:NM_001814 163 1205 1206 a>T 

source isSNP SNP00006580 

consequence GB:NM_001814 . 1 164 Silent 391-391 T 

Allele GB:NM_001814 . 163 1224 1224 A>G 

190 
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source 
consequence 
GIF CTSC-cdna-fwd.gif 
Link : CTSC_link_genomic 



isSNP SNP00105444 
GB:]S!M_001814.1 



Subsequence 
Subsequence 
Subsequence 
Subsequence 
CDS CTSC_cds 

exon 150285 



CTSC_cds.l 150285 
CTSC_cds.2 150285 
QB:AC011088_8 
CTSC_inma_build.l 150318 



1392 bp 

150114 



7 exons 



106619 #165 
106619 #166 
164991 #167 

106206 #168 
#165 



exon 147695 147550 

exon 125167 125001 

exon 121931 121776 

exon 113258 113143 

exon 108877 108746 

exon 107121 106619 

CTSC_cds.2 1260 bp 6 exons 

exon 150285 150114 

147695 147550 

125167 125001 

121931 121776 

113258 113143 

107121 ■ 106619 
CTSC_inma_build.l 1838 bp 



exon 
exon 
exon 
exon 
exon 

inRNA 

exon 
exon 
exon 
exon 
exon 
exon 
exon 

Allele 



7 exons 



#168 



150318 150114 
147695 147550 
125167 125001 
121931 121776 
113258 113143 
108877 108746 
107121 106206 

dB:AC011088_8 . 167 106820 106820 A>Q 

source isSNP SNP00105444 

consequence CTSC_cds.l 165 Silent 397-397 
consequence CTSC_cds.2 166 Silent 353-353 
GB:AC011088_8 167 106838 106838 G>T 

source isSNP SNP00006580 

consequence CTSC_cds.i 165 Silent 391-391 
consequence CTSC_cds.2 166 Silent 347-347 
GB:AC011088_8 167 122438 . 122438 A>6 

dbSNP gnl |dbSNP| ssl078568_allele- 
dbSNP gnl | dbSNP | ssl088590_allele 
dbSNP gnlidbSNP|ss382670_allele 
dbSNP gnl j dbSNP | ss403413_allele 
CTSC_t:ds.l 165 Intron 
CTSC^cds.2 166 Intron 

124932 124932 A>T 

GB : AC011088_8 . V124932 . A>T 

165 Intron 

166 Intron 

125028 125028 A>G 

isSNP SNP00006579 

wetSNP GB:AC011O88_8.vl25O28.A>G 



source 
source 
source 
source 
consequence 
consequence 
GB:AC011088_8 167 
source wetSNP 
consequence CTSC_cds . 1 
consequence CTSC_cds . 2 
GB:AC011088_8 
source 
source 



167 



consec[uence 
consequence 



CTSC_cds . 1 
CTSC_cds . 2 



165 



Missense 
Missense 



153-153 
153-153 



I>T 
I>T 
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GB:AC011088_8 



consequence 

consequence 

GB:AC011088. 

source 

consequence 

cons equenc e 

GB: AGO 11 08 8. 

source 

consequence 

consequence 



167 142996 142996 
dbSNP gnlldbSNP|ssl530135_allele 



A>6 



CTSC_cds . 1 
CTSC_cds . 2 
8 167 
wetSNP 
CTSC_cds . 1 
CTSC_cds . 2 
8 167 
isSNP SNP00067426 
CTSC_cds.l 165 
CTSC_cds.2 166 



165 Intron 

166 Intron 

150261 150261 A>G 

GB:AC011088_8.vl50261.G>A 



165 Missense 9-9 L>F 

166 Missense 9-9 I.>F 
150303 150303 A>G 



GIF CTSC-genoinic-rev.gif 



CTSL 

Full name : cathepsin L 

lilnk : CTSLi_linl^genond.c 

Subsequence CTSL_cds.l 35962 179319 #169 

Subsequence GB:AL160279_2 1 186528 

Subsequence CTSL_inma_build. 1 34477 179604 



#170 
#171 



Subsequence CTSIi_cds . 

mRNA CTSL_mma_bu i 1 d 

34477 34756 
35952 36087 
36385 36507 
36608 36754 
36943 37167 
37931 38093 
38739 38856 
179220 179604 
CDS CTSL_cds.l 1002 bp 7 ex 

exon " 35962 36087 
36385 36507 
36608 36754 
36943 37167 
37931 38093 
38739 38856 
179220 179319 
CDS CTSI._cds.2 777 bp 6 ex 

exon 359 62 36087 
36385 36507 
36608 36754 
37931 38093 
38739 38856 
179220 179319 
GB:AL160279_2 170 
source wetSNP- 
consequence CTSL_cds . 1 
consequence CTSL_cds . 2 
GB:AL160279_2 170 
source wetSNP 
consequence CTSL_cds . 1 
consequence CTSL_cds - 2 



35962 179319 #172 
1577 bp 8 exons #171 



exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 



exon 
exon 
exon 
exon 
exon 
exon 



exon 
exon 
exon 
exon 
exon 
Allele 



Allele 



35919 35919 OQ 
GB : AL160279_2 .V35919 . OG 
169 5' 
172 5' 
36118 36118 A>G 
GB: AL160279_2 .v36118 . OT 
169 Intron 
172 Intron 
192 
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GB:AL160279_2 170 35191 36191 G>T 

source wetSNP GB : AL160279_2 .v36191 . C>A 

consequence CTSL_cds.l 159 Intron 
consequence CTSL_cds.2 172 Intron 
GB:AL160279_2 170 44998 44998 A>G 

source isSNP SNP00043782 

consequence CTSL_cds.l 169 Intron 
consequence CTSL_cds.2 172 Intron 
GB:AL160279_2 170 45748 45748 A>G 

source isSNP SNP00007530 

consequence CTSL_cds.l 169 
consequence CTSL_cds.2 172 
GB:AL160279_2 170 45833 

source isSNP SNP00100366 

consequence CTSL_cds.l 169 
consequence CTSL_cds . 2 172 
GB:AL160279_2 170 46188 

source isSNP SNP00100365 

consequence CTSL_cds . 1 169 
consequence CTSL_cds.2 172 
GB:AL160279_2 170 46599 46599 OG 

source isSNP SNP00061067 

consequence CTSL_cds . 1 169 
consequence CTSL_cds.2 172 
GB:AL160279_2 170 46562 

source isSNP SNP00100364 

consequence CTSL_cds.l 169 
consequence CTSL_cds.2 172 
QB:AL160279_2 170 65760 

source isSNP SNP00048929 

consequence CTSI._cds.l 169 
consequence CTSLi_cds . 2 172 
GB:AL160279_2 170 81133 81133 A>Q 

source dbSNP gnl | dbSNPj ss920176_allele 

dbSNP gnlldbSNPlssl066694_allele 
dbSNP gnl|dbSNP|ss402532_allele 
169 Intron 
172 Intron 
_2 170 . 104937 104937 



Intron 
Intron 
45833 OG 

Intron 
Intron 
46188 A>G 

Intron 
Intron 



Intron 
Intron 
46662 OG 

Intron 
Intron 
65760 A>Q 



Intron 
Intron 



source 
source 

consequence CTSL_cds . 1 
consequence CTSL_cds . 2 
GB:AL160279_ 
source IsSNP SNP00055641 

consequence CTSIi_cds.l 169 Intron 
consequence CTSL_cds.2 172 Intron 
GB:AL160279^2 170 115466 115466 

source isSNP SNP00100363 

consequence CTSL_cds.l. 169 Intron 
consequence CTSL_cds . 2 172 Intron 
GB:AL160279_2 170 127655 127655 

source dbSNP gnl | dbSNP | ss8107 69_allele 

consequence CTSL_cds.l 169 Intron 
consequence CTSL_cds . 2 172 Intron 
GB:AL160279_2 . 170 149731 149731 

source dbSNP gnl | dbSNP | ssl452230_allele 

consequence CTSL_cds . 1 169 Intron 
consequence CTSL_cds.2 172 Intron 
i-genomic-fwd. gif 

193 
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DAF 

Full name : decay accelerating factor for complement 
Link : DAF_link_genomic 



Subsequence 
Subsequence 
Subsequence 
Subsequence 



DAF_cds.l 131174 
DAF_cds.2 131174 
GB:AC031978_3 
DAF_jcnma_build . 1 



CDS DAF_cds.l 1146 


bp 10 exons 


exon 


131174 


131273 


exon 


131790 


131975 


exon 


133967 


134158 


exon 


135030 


135129 


exon 


136160 


136245 


exon 


140516 


140704 


exon 


146101 


146226 




146737 


146817 


exon 


148808 


148828 


exon 


168960 


169024 


CDS DAF_cds.2 1125 


bp 9 exons 


exon 


131174 


131273 


exon 


131790 


131975 


exon 


133967 


134158 


exon 


135030 


135129 


exon 


136160 


136245 


exon 


140516 


140704 


exon 


146101 


146226 


exon 


146737 


146817 


exon 


168960 


169024 


mRNA 


DAF_jnma_build.l 2084 bp 


exon 


131109 


131273 


exon 


131790 


131975 


exon 


133967 


134158 


exon 


135030 


135129 


exon 


136160 


136245 


exon 


140516 


140704 


exon 


146101 


146226 


exon 


146737 


146817 


exon 


148808 


148828 


exon 


168960 


169897 • 



169024 
169024 
1 170170 
131109 169897 
#173 



#173 
#174 
#175 



132041 132041 A>G 

GB:AC031978_3.vl32041.C>T 

173 Intron 

174 Intron 

146352 146352 A>G 



GB:AC031978_3 175 
source wetSNP 
cons equence DAF_cds . 1 
consequence DAF_cds . 2 
GB:AC031978_3 175 
source isSNP SNP00072272 

consequence DAF_cds.l 173 Intron 
consequence DAF_cds.2 174 Intron 
GB:AC031978_3 175 146611 146611 

source isSNP SNP00072273 

consequence DAF_cds.l 173 Intron 
consequence DAF_cds.2 174 Intron 
GB:AC031978_3 175 146659 146659 

source isSNP SNP00030860 

194 
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consequence DAF_cds.l 173 Intron 
consequence DAF_cds . 2 174 Intron 
GB:AC031978_3 175 165604 165604. 

source isSNP SNP00102533 

consequence DAF_cds.l 173 Intron 
consequence DAF_cds.2 174 Intron 
GB:AC031978_3 175 165743 165743 

source isSNP SNP00102534 

consequence DAF_cds.l 173 Intron 
consequence DAF_cds.2 174 Intron' 



GIF DAF-genomic-fwd.gif 



E2F6 

Full name : E2F transcription factor 6 

Link : E2F6_link_cdna 

Subsequence GB:AF041381 1 2027 #177 

Allele GB:AF041381 177 1399 1399 A>G . 

source isSNP SNP00002319 



Full name : EGF 

Link : EGF_link_cdna 

Subsequence GBiHSEGFRER 1 4871 #178 

CDS GB:HSEGFRER.l 3624 bp #179 

ORF 437 4060 
Allele GB:HSEGFRER 178 4453 4453 A>G 

source isSNP SNP00043643 

consequence GB : HSEQFRER . 1 179 3' 

GIF EGF-cdna-fwd.gif 

Link : EGF_link_genoniic 

Subsequence GB:AC005509 1 143391 #180 

Subsequence GB:AC004050 270590 143492 #181 

Sxibsequence EGF_cds.l 64892 166730 #182 

Subsequence EGF_jnma_build. 1 64456 167538 #183 

CDS EGF_cds.l 3624 bp 24 exons #182 



exon 


64892 


65018 




exon 


92502 


92701 




exon 


94810 


94991 




exon 


95398 


95625 




exon 


96629 


96831 




exon 


110868 




110993 


exon 


112423 




112545 


exon 


113419 




113541 


exon 


114729 




114854 




115957 




116093 


exon 


120527 




120675 


exon 


126259 




126363 


exon 


127568 




127791 


exon 


131528 




131695 


exon 


132382 




132531 


exon 


134978 




135097 
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exon 
exon 



exon 
exon 
exon 
exon 

iriRNA 

exon 
exon 
exon , 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
Allele 



139300 
143859 
148522 
150008 
154954 
159780 
163427 
166477 



139416 
143984 
148644 
150155 
155121 
159897 
163505 
166730 



EGF_inma_build . 1 
64456 65018 
92502 92701 
94810 94991 
95398 95625 
96529 96831 
110868 



4868 bp 



112423 
113419 
114729 
115957 
120527 
126259 
127568 
131528 
132382 
134978 
139300 
140140 
148522 
150008 
154954 
159780 
163427 
166477 
GB:AC005509 



110993 
112545 
113541 
114854 
116093 
120675 
126363 
127791 
131695 
132531 
135097 
139416 
140265 
148644 
150155 
155121 
159897 
163505 
167538 

180 70903 70903 A>G 
dbSNP gnl|dbSNP|ss875266_allele 
consequence EGF_cds.l 182 Intron 
aB:AC005509 180 92638 92638 A>G 
source wetSNP GB:AC005509 .v92638 .C>T 

consequence EQF_cds.l 182 Silent 88-88 I 

GB:AC005509 180 92670 92670 A>.G 
source wetSNP GB:AC005509 .v92670.A>G 

consequence EGF_cds.l 182 Missense 99-99 Q>R 

QB.:AC005509 180 92763 92763 A>G 
source wetSNP GB:AC005509 .v92763 .OT 

consequence EGF_cds . 1 182 Intron 
QB:AC005509 ISO 94933 94933 A>G 
source wetSNP aB:AC005509 .v94933 .OT 

consequence EGF_cds.l 182 Missense 151-151. 
GB:AC005509 180 95444 95444 C>G 

wetSNP GB:AC005509.v95444.G>C 
EGF_cds.l 182 Missense 186-186 
180 96578 96578 G>T 
wetSNP GB:AC005509 .v96578.A>C 

consequence EGF_cds.l 182 Intron 
GB:AC005509 180 96660 96660 C>G 
196 



source 
consequence 
GB:AC005509 
source 



_030S416eA2_L 
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source 

consequence 

Q^:AC005509 

source 

consequence 

GB:AC005509 

source 

consequence 

GB.:AG005509 

source 

source 

consequence 

GB:AC005509 

source 

consequence 

GB:AC005509 

source 

consequence 

GB:AC005509 

source 

consequence 

GB:AC005509 

source 

consequence 

GB:AC005509 

source 

consequence 

GB:AC005509 

source 

consequence 

GB:AC005509 

source 

consequence 

GB:AC005509 

source 

consequence 

GB:AC005509 

source 

consequence 

GB:AC005509 

source 

consequence 

GB:AC005509 

source 

consequence 

GB:AC005509 

source 

consequence 

QB:AC004050 

source 

consequence 

GB:AC004:050 

source 

consequence 

GB: AGO 04 050 



wetSNP 
EQF_cds . 1 
180 96842 
wetSNP 
EGF_cds . 1 



GB;AC005509 .v96660.G>C 
182 Missense 257-257 
96842 A>G 

GB:AC005509 .v9S842 .G>A 
182 Intron 
180 96853 96853 A>G 
wetSNP GB:AC005509 .V96853 .G>A 

EQF_cds.l 182 Intron 
180 100795 100795 Q>T 

dbSNP gnl|dbSNP|ss48546_allele 
dbSNP gnl |dbSNP|ss569965_allele 
EGF_cds.l 182 Intron 
180 112451 112451 A>Q 

wetSNP GB:AC005509 .vll24Sl.T>C 

EGF_cds.l 182 Silent 365-365 
180 113396 113396 .A>G 

wetSNP GB:AC005509 .vll3396 .T>C 

EGF_cds.l 182 Intron 
180 113521 113521 A>G 

wetSNP GB:AC005509 .vll3521.G>A 

EGF_cds.l 182 Missense 431-431 
180 114696 114696 A>G 

wetSNP GB:AC005509 .V114696.0T 

EGF_cds.l 182 Intron 
180 126323 126323 A>G 

wetSNP GB:AG005509.vl26323.A>G 
EGF_cds.l 182 Missense 597-597 
180 127715 127715 A>G 

wetSNP GB:AC005509 .V127715 .C>T 

EGF_cds.l 182 Silent 659-659 
180 131547 131547 A>G 

wetSNP GB:AC005509 .V131547 .A>G 

EGF_cds.l 182 Silent 691-691 
180 131598 131598 A>Q 

wetSNP GB:AC005509.vl31598.G>A 
EGF_cds.l 182 Missense 708-708 
180 131641 131641 OG 

wetSNP GB:AC005509 .vl31641.G>C 

EGF_cds.l 182 Missense 723-723 
180 132511 132511 A>T 

wetSNP GB:AC005509 .vl32511.A>T 

EGF_cds.l 182 Missense 784-784 
180 139281 139281 A>G 

wetSNP GB:AC005509.vl39281.G>A 
EGF_cds.l 182 Intron 

180 139333 139333 A>G 
wetSNP GB:AC005509.vl39333.T>C 
EGF_cds.l 182 Missense 842-842 

181 126737 126737 G>T 
wetSNP GB:AC004050.vl26737.C>A 
EGF_cds.l 182 Intron 

181 122948 122948 A>G 

isSNP SNP00118827 
EGF_cds.l 182 Intron 
181 122045 122045 A>T 
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source wetSNP GB : AC004050 .vl22045 . A>T 

consequence EGF_cds.l 182 Missense 920-920 E>V 
Allele GB:AC004050 181 110980 110980 G>T 

source isSNP .SNP00101773 

consequence EGF_cds . 1 182 Intron 
Allele GB:AC004050 181 110796 110796 A>G 

source wetSNP QB:AC004050 .vll079 6 .A>G 

consequence EGF_cds.l 182 Silent 1063-1063 L 

Allele QB:AC004050 181 104082 104083 GOGCC 

source wetSNP GB :AC004050 .vl04082 .GOGCC 

consequence EGF_cds.l 182 Frameshift 1134-1135 
Allele GB:AC004050 181 103468 103468 A>(3 

source isSNP SNP00043643 

consequence EGF_cds.l 182 3/ 
GIF EGF-genomic-fwd.gif 



#184 



A>G 



FDFTl 

Full name : famesyl -diphosphate famesyltransf erase 1 
Link : FDFTl_link_cdna 

Subsequence GB: FDFTl 1 1649 

CDS GB: FDFTl. 1 1254 bp #185 

ORF 45 129 8 

Allele GBtFDFTl 184 65 65 

source isSNP SNP00072434 

consequence GB : FDFTl .1 185 
GB: FDFTl 184 178 178 

source isSNP SNP00065489 

consequence GB : FDFTl .1 185 
GB: FDFTl 184 245 245 

source isSNP SNP00018570 

consequence GB : FDFTl .1 185 
GB: FDFTl 184 590 590 

source isSNP SNP00123116 

consequence GB : FDFTl. 1 185 
GB: FDFTl 184 1016 1016 

source isSNP SNP00003188 

consequence GB : FDFTl .1 185 
GB:FDFT1 184 1220 1220 

source isSNP SNP00123117 

consequence GB : FDFTl .1 185 
GB: FDFTl 184 1532 1532 

source isSNP SNP00003189 

consequence GB:FDFT1.1 185 
GIF FDFTl-cdna-fwd.gif 
Link : FDFTl_link_genoniic 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Silent 
A>G 



Missense 
A>G 



Silent 
A>Q 



Silent 
OG 



Silent 
A>G 



Silent 
A>G 



45-45 K>R 



Subsequence 
Subsequence 
Subseqtuence 
Subsequence 
Subsequence 
Subsequence 
iciKNA 



FDFTl_cds.l 5681 37973 #186 

GB:AC025857_2_000033 1 19420 

GB:AC025857_2_000021 19521 25487 

GB:AC025857_2_000014 29099 

GB:AC025857_2_OO0029 29200 

FDFTl_jnma_bui 1 d . 1 5639 



#187 
#188 



FDFTl_mma_build . 1 
5639 5779 



25588 #189 
40859 #190 
,38324 #191 
8 exons 
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exon 
exon 
exon 
exon 
exon 
exon 
exon 
CDS FDFTl. 
exon 
exon 
exon 
exon 



exon 
exon 
Allele 



11642 11739 
12515 12698 
24238 24366 
26209 26400 
29608 29784 
30882 31034. 
37752 38324 
_cds.l 1254 bp 8 exons 

5681 5779 
11642 11739 
12515 12698 
24238. 24366 
26209 26400 
29608 29784 
30882 31034 
37752 37973 

GB:AC025857_2_000033 187 
source isSNP SNP00072434 

consequence FDFTl_cds . 1 186 
GB:AC025857_2_000033. 187 
source isSNP SNP00072231 

consequence FDFTl_cds.l 186 
GB:AC025857_2_000033 187 
source isSNP SNP00065489 

consequence FDFTl_cds.l 186 
GB:AC025857_2_000014 189 
source isSNP SNP00123116 

consequence FDFTl_cds.i 186 
GB:AC025857_2_000029 190 
source isSNP SNPO 0003 188 

consequence FDFTl_cds,l 186 
QB:AC025857_2_000029 190 
source isSNP SNP00096026 

consequence FDm_cds.l 186 
GB:AC025857_2_000029 190 
source isSNP SNP00ld5147 

consequence FDFTl_cds.l 186 
GBjAC025857_2_000029 190 
source isSNP SNP00123117 

consequence FDFTl_cds.l 186 
GB:AC025857_2_000029 190 
source isSNP SNP00003189 

consequence FDFTl_cds.l 186 
QB:AC025857_2_000029 190 
source isSNP SNP00003190 

consequence FDFTl_cds.l 186 
.-genomic-f wd . gif 



5701 5701 A>G 



Silent 7-7 
6103 6103 OG 



Introri 

11676 11676 A>G 



Missense 45-45 K>R 
2856 2856 A>0 



Silent 182-182 
1775 1775 OG 



Silent 324-324 
5704 5704 A>G 



Intron 

8528 8528 A>G 



Intron 

869.6 8696 A>G 



Silent 392-392 
9008 9008 A>G 



9148 9148 G>T 



FGFl 

Full name : Fibroblast growth, factor 1 (acidic) 

Link : FGFl_link_Gdna 

Subsequence GB:X51943_1 1 2259 #192 

CDS GB:X51943_1.1 468 bp #193 

199 
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Allele 



35 502 

GB:X51943_1 

source 

consequence 

GB:X51943_1 

source 



192 590 590 
isSNP SNP00075582 
GB:X51943_1.1 
192 785 785 
isSNP SNP00075583 



193 
G>T 



Allele 



Allele 



consequence GB:X51943_1 .1 193 
GB:X51943_1 192 1855 1855 A>G 
source isSNP SNP00069845 

consequence GB:X51943_1,1 
aB:X51943_l 192 2007 2007 
source isSNP SNPd0075584 

consequence GB:X51943_1 . 1 
GIF FGFl-cdna-fwd.gif 
Link : FL_2535357_link_genomic 

Subsequence GB:AC005370 1 76416 

Subsequence GB : AC005370_3284782CD1 

Subsequence FL_32847 82_inma_build. 1, 

iriRNA FL_3284782_inma_build.l 920 bp 



193 
C>G 



#194 
45026 



63860 #195 
44979 67355 #196 



exon 
exon 
exon 
exon 



exon 
Allele 



FL_3 28478 2_inma_bui Id . 1 
44979 45194 
58348 58451 
63669 64259 
67347 67355 
GB:AC005370_3284782CD1 465 ] 
exon 45026 45194 
exon 58348 58451 
63669 63860 
QB:AC005370 
source 
consequence 
QB:AC005370 
source 
consequence 
GBtAC005370 
source 
consequence 
GB:AC005370 
source 



consequence 
GB:AC0O537O 

consequence 
GB:AC065370 
source 
consequence 



194 63951 63951 A>G 
isSNP SNP00075582 
GB:AC005370_3284782CD1 
194 64146 64146 G>T 
isSNP SNP00075583 
GB:AC005370_3284782CD1 
194 65119 65119 G>T 
isSNP SNP00012384 
QB : AC005370_3284782CD1 
194 65217 65217 A>G 
isSNP SNP00069B45 
QB:AC005370_3284782CD1 
194 65369 65369 OG 
isSNP SNP00075584 
GB:AC005370_3284782CD1 
194 66005 66005 A>G 
isSNP SNP00045433 
GB:AC005370_3284782CD1 



GIF FGFl-genomic-fwd.gif 



FGF2 

Full name : fibroblast growth factor 2 (basic) 

Link t FQF2_link^cdna 

subsequence GB:FGF2 1 6757 #197 

CDS GB:FGF2-1 633 bp #198 
ORF 302 934 

200 
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GB:FGF2 
source 
consequence 
GB. : FGF2 



consequence 

Allele GB:FGF2 
sour-ce 
conseqpience 

Allele GB:FGF2 
source 
consequence 

Allele GB:FGF2 
source 
consequence 

Allele GB:FGF2 
source 
consequence 

Allele GB:FGF2 
source 
consequence 

Allele GB:FGF2 
source 
consequence 

GIF FGF2-cdna-fwd.gif 



197 1651 1651 ( 
isSNP SNP00023270 
GB:FGF2.1 198 
197 1691 1691 
isSNP SNP00058183 
GB:FGF2.1 198 
197 4603 4603 
isSNP SNP00036340 
QB:FGF2.1 198 
197 4909 4909 
isSNP SNP00036341 
QB:FGF2.1 198 
197 5455 5455 
isSNP SNP00123025 
GB:FGF2.1 198 
197 5466 5466 
isSNP SNP00036342 
GB:FGF2.1 198 
197 5892 5892 
isSNP SNP00062439 
GB:FGF2.1 198 
197 5937 5937 
isSNP SNP00062440 
GB:FGF2.1 198 



FGFRl 

Full name : Fibroblast growth factor receptor-1 
Link : FGFRl_link_cdna 

Subsequence GB:M34185_1 1 3365 



CDS GB:M34185_ 
ORF 256 
Allele 



.1.1 



2202 bp 



2457 
GB:M34185_1 
source 
consequence 
Allele GBiM34185_l 
source 

consequence GB :M34185_1 . 1 
GIF FGFRl-cdna-fwd.gif 



#199 
#200 



199 1471 1471 
isSNP SNP00107960 
GB:M34185_1.1 
199 3224 3224 
isSNP SNP00107961 



200 
G>T 



FMOD 

Full name : f ibromodulin 
Link : FMOD_l ink_cdna 

Subsequence GBiFMQD 1 2863 

CDS GB:FMOD.l 1131 bp #202 

ORF 21 1151 
Allele GB:FMOD 201 2653 2653 

source isSNP SNP00001499 

consequence GBrFMOD.l 202 
Allele GB:FMOD 201 2739 2739 

source 
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consequence GB:FMOD.l 202 3' 
GIF FMOD-cdna-fwd-glf 



FRZB 

Full name : f rizzled-related protein 
Link : FRZB_link_cdna 

Subsequence GB:U91903_1 1 



CDS GB:U91903_1.1 



978 bp 



#203 
#204 



ORF 
Allele . 



70 1047 
GB:U91903_1 
source 



203 667 667 
isSNP SNP00016790 



consequence GB :U919 03_1 . 1 



Allele QB:U91903_1 
source 
consequence 

Allele GB:U91903_1 
source 
consequence 

Allele GB:U91903_1 
source 
consequence 

GIF FRZB-cdna-fwd.gif 



203 1039 1039 
isSNP SNP00001065 
GB:U91903_1.1 
203 1259 1259 
isSNP SNPO0O01066 
GB:U91903_1.1 
203 1305 1305 
isSNP SNP00016791 
GB:U91903_1.1 



204 
OG 



204 
A>G 



204 
A>G 



FST 

Full name : Follistatin 
Link : FST_link_cdna 

Stibsequence GB:FST 1 954 #205 

CDS GB:FST.l 954 bp #206 

ORF 1 954 
Allele QB:FST 205 454 454 A>G 

source isSNP SNP00015508 

consequence GBsFST-1 206 Missense 

Allele GB:FST 205 853 853 OG 

source isSNP SNP00052278 

consequence GB:FST.l 206 Missense 

GIF FST-cdna-fwd.gif 
Link : FST_link_genomic 

Subsequence FST_cds.l 77877 73442 #207 

Subsequence GB : AC008901_2 1 192639 

Subsequence FST_mma_build.l 77877 73440 #209 

CDS FST_cds.l 951 bp 5 exons #207 

exon 77877 77793 
exon 75788 75597 
exon 75164 74946 
exon 74599 74375 
exon 73671 73442 
iriRNA FST_inma_bui Id . 1 

exon 77877 77793 
exon 75788 75597 
exon 75164 74946 
exon 74599 74375 



953 bp 



BNSDOCID; <W O 0 30S4166Ag I > 



wo 03/054166 



PCT/US02/41225 



TABLE 1 (Cont.) 



73454 73454 A>G 
GB:AC008901_2.v73454.G>A 

313-313 



exon 73671 73440 
Allele QB:AC008901_2 208 
source wetSNP 

consequence FST_^Gds . 1 207 Silent 

Allele GB:AC008901_2 208 73540 73540 OG 

source isSNP SNP00052278 

consequence FST_cds.l 207 Missense I 

Allele GB:AC008901_2 208 74988 74988 A>G 

source isSNP SNP00015508 

consequence FST_cds.l 207 Missense ] 

Allele QB:AC008901_2 208 76361 76361 OG 

source dbSNP gnl | dbSNP ] ss42460_allele 

consequence FST_cds.l 207 Intron 

Allele GB:AC008901_2 208 76373 76373 A>G 

source dbSNP gnl | dbSNP | ssl048607_allel« 

source dbSNP gnl | dbSNP j ss226044_allele 

consequence FST_cds.l 207 Intron 

Allele GB:AC008901_2 208 76384 76384 A>G 

source dbSNP gnl | dbSNP | ss839844_allele 

consequence FST_cds.l 207 Intron 

GIF FST-genomic-rev.gif 



G0S2 

Full name : putative lymphocyte G0\/G1 switch gene 
Link : FL_3732868_link_genomic 

Subsequence GB:HS28Ol0 1 97700 #210 

Subsequence GB:HS28O10_3732868CDl , 52369 52680 #211 

Subsequence FL_3732868_mma_build. 1 52008 53073 #212 

mRNA FL_3732868_inma_build.l 963 bp 2 exons #212 

exon 52008 52233 
exon 52337 53073 
CDS GB:HS28O10_3732868CDl 312 bp 1 exon #211 

exon 52369 52680 
Allele GB:HS28O10 210 52341 52341 A>G 

source isSNP SNP00039143 

source wetSISTP QB:HS28O10 .v52341 .T>C 

consequence GB :HS28O10_3732868CDl 211 5' 
GIF G0S2-genomic-fwd.gif 



QADD34 

Full name : growth arrest and DNA-damage-inducible 34 
Link : GADD34_link_cdna 

Subsequence GB:HSU83981 1 2331 



CDS GB:HSU839B1 
ORF 223 



2247 
GB:HSU83981 
source 
consequence 
GB:HSU83981 
source 



2025 bp 

213 205 205 
isSNP SNP00116263 
GB:HSU83981.1 
213 314 314 
isSNP SNP00116264 



#213 
#214 



consequence GB : HSU83981 . 1 



203 



214 
A>G 



31-31 R>H 
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Missense 



Silent 



GB:HSU83981 213 316 316 A>G 
source isSNP SNP00029694 

consequence GB;HSU83981 .1 214 Missense 

GB:HSU83981 213 974 974 OG 
isSNP SNP00006368 

aB:HSUB3981.1 214 Missense 

213 1051 1051 A>G 
isSNP SNP00006369 
GB:HSU83981,1 
213 1156 1156 
isSNP SNP00006370 
GB:HSU83981.1 
213 1605 1605 
isSNP SNP00069978 
GB:HSU83981.1 
213 1650 1650 
isSNP SNP00069979 
GB:HSU83981.1 
213 2011 2011 
isSNP SNP00006372 
GB:HSU83981.1 
213 2184 2184 
isSNP SNP00006373 
GB:HSU83981.1 
213 2199 2199 
isSNP SNP00006374 
GB:HSU83981.1 
GIF GADD34-cdna-fwd.gif 
Link : QADD34_link_genoinic 

Subsequence GADD34_cds.i 221390. 

Subsequence GB: AC026803_2 1 247509 #216 

Subsequence GADD34 jnma_build. 1 220595 



Alle 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



source 
consequence 
GB:HSU8398i 
source . 
cons equence 
6B:HSU83981 
source 
consequence 
GB:HSU83981 
source 
consequence 
GB:HSU83981 
source 
c ons equenc e 
GB:HSU83981 
source 
consequence 
QB:HSU83981 
source 
consequence 
QB:HSU83981 
source 
consequence 



214 
A>G 



214 
A>G 



214 
G>T 



214 
A>G 



214 
A>G 



214 
OG 



214 



32-32 A>T 



Miss^ 



Missense 



Silent 



Silent 



224129 



itiRNA GADD34_inma_build.l 2331 bp 

exon 220595 220807 
exon 221381 223054 
exon 223770 224213 
CDS GADD34_cds.l 2025 bp 

exon 221390 223054 
exon 223770 224129 

GB:AC026803_2 216 
source isSNP SNP00116264 

consequence GADD34_cds.l 215 
GB:AC026803_2 216 221483 

source isSNP SNP00029694 

consequence GADD34_cds.l 215 
GB:AC026803_2 



224213 
exons #217 



Allele 



Allele 



221481 



Allele 



Missense 
221483 



Missense 
221941 



31-31 R>H 
A>G 



32-32 A>T 
A>G 



2 216 221941 

wetSNP • GB:AC026803_2.v221941.G>A 

consequence GADD34_cds.l 215 Silent 184-184 

GB:AC026803_2 216 221985 221985 A>G 

source wetSNF GB: AC026803_2 .v221985 .T>C 

consequence GADD34_cds.l 215 Missense 199-199 

GB:AC026803_2 216 222141 222141 OG 

source- isSNP SNP00006368 

source wetSNP GB: ACp26803_^2 .v222141 .G>C 

consequence GADD34_cds.l 215 Missense 251-251 

204 
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Allele GB:AC026803_ 
source 
consequence 

Allele GB:AC026803_ 
source 
consequence 

Allele GB:AC026803_ 
source 
consequence 

Allele QB:AC026803. 
source 
consequence 

Allele GB:AC026803. 
source 
consequence 

Allele GB:AC026803. 



con s equenc e 
GB:AC026803. 
source 
consequence 



2 216 222218 

isSNP SNP00006369 
GADD34_cds.l 215 

2 216 222323 

isSNP SNP00006370 
GADD34„cds.l 215 

2 216 222772 

isSNP SNP00069978 
GADD34_cds.l 21 5 

.2 216 222817 

isSNP SNP00069979 
GADD34_cds.l 215 

.2 216 223893 

isSNP SNP00006372 
GADD34_cds.l 215 

2 216 224066 

isSNP SNP00006373 
GADD3 4_cds .1 215 

2 216 224081 

isSNP SNP00006374 
GADD34_cds.l 215 



Missense 
222323 



Missense 
222772 



Silent 
222817 



Missense 
223893 



Missense 
224066 



Silent 
224081 



277-277 
A>G 



312-312 
A>G , 



461-461 
G>T 



476-476 
A>G 



597-597 
A>G 



654-654 
OG 



GIF QADD34-genomic-fwd.gif 



GLI 

Full name : gliona-associated oncogene hoiaolog 
Link : QLI_linliL_cdha 

Subsequence GB :NM_005269_1 1 

CDS aB:NM_005269_l.l 3321 bp 
ORF 79 3399 

Allele GB:NM_005269_1 218 2179 

source isSNP SNP00018615 

consequence QB:]SIM_005269_1 . 1 

Allele GB:NM_005269_1 218 2202 

source isSNP SWP00072776 

consequence GB:NM_005269_1'. 1 

Allele GB:NM_005269_1 218 2876 

source isSNP SNP00112595 

consequence GB:l®aL.005269_l.l 

Allele GB:NM_005269_1 218 3243 

source isSNP SNP00018616 

consequence GB:NM_005269_1.1 

Allele GB:NM_005269_1 218 3376 

source isSNP SNP00018617 

consequence QB:NM_005269_1.1 

GIF GLI-cdna-fwd.gif 



3600 
#219 



219 
2202 



219 
2876 



219 
3243 



219 
3376 



219 



Missi 
A>G 



Silent 
A>G 



Missense 
OQ 



Missense 
OG 



GLI3 

Full name : GLI-Kruppel family member GLI3 

Link :. GLI3_link_cdna 

Subsequence GB:NM_000168_1 1 5046 #220 

caDS GB:NM_000168_1.1 4791 bp #221 

205 
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55 



4845 



QB:NM_000168_1 220 4502 4502 A>G 

source ' isSNP SNP00031650 

consec[uence GB:lsnyl_000168_l . 1 221 Missense 

GB:NM_000168_1 220 4663 4663 A>G 

source isSNP SNP00073523 

consequence GB:NM_000168_1 . 1 221 Missense 



GIF GLI3-cdna-fwd,gif 



Allele 



• Allele 



2088 
#223 



75 



223 
1889 



223 



#222 



A>G 



Missense 
G>T 



HASl 

Full name : hyaluronan synthase 1 
Link : HASl_link_cdna 

Subsequence . GB:NM_001523 1 

CDS GB:NM:_001523.1 1737 bp 

ORF 36 1772 

GB:NM_0bl523 222 75 

source isSNP SNP0009 6015 

consequence GB:NM_001523 . 1 
GB:1SIM_001523 222 1889 

source isSNP SNP00064738 

consequence GB :NM_001523 . 1 
GIF HASl-cdna-fwd.gif 
Link- : HASl_link;_genomic 

Subsequence HASl_cds.l 153154 142648 #224 

Subsequence GB: AC018755_2 1 231222 #225 

Subsequence HASl_inma_buiid. 1 153189 142333 . #226 

CDS HASl_cds.l 1737 bp 5 exons #224 

exon 153154 153146 
149119 148427 
146414 146189 
145609 145477 
143323 142648 

HASl_mma_build.l 2087 bp 5 exons #226 - 

. 153189 153146 
149119 148427 
146414 146189 
145609 145477 
143323 142333 

GB:AC018755_2 225 142531 142531 G>T 

source isSNP SNP00064738 

consequence HASl_cds.l 224 3' 

GB:AC018755_2 225 147775 147775 6>T 

source dbSNP gnl | dbSNP ] ss715930_allele 

consequence HASl_cds.l 224 Intron 

GB:AC018755_2 225 149089 149089 A>G 

source isSNP SNP00096015 

consequence HASl_cds.l 224 Missense 14-14 OR 

GB:AC018755_2 225 149293 149293 OG 

source dbSNP gnl j dbSNP | ss713606_allele 

consequence HASl_cds.l 224 Intron 
GIF HASl-genoiniG-rev.gif 



14-14 R>C 



exon 
exon 
exon 
exon 

exon 
exon 
exon 
exon 
exon 
Allele 



Allele 



Allele 



Allele 
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HAS2 

Full name : hyaluronan synthase 2 
Link : HAS2_link_cdna 

Subsequence QB:NM_005328 1 3003 #22' 

CDS GB:NM_005328.1 1659 bp #228 

ORF 536 2194 
Allele GB:NM_005328 227 381 381 A>G 

source isSNP SNP00072998 

consequence QB:NM_005328 . 1 228 5' 

Allele QB:NM_005328 227 1357 1357 G>T 

source . isSNP SNP001Q4961 

consequence GB:NM_005328 . 1 228 Misi 

GIF HAS2-cdna-fwd.gif 



: proteoglycan 



HSPG2 

Full name : heparan sulfate 
Link : HSPG2_link_cdna 

Subsequence GB :NM_005529_2 1 

CDS GB:NM:_005529_2 .1 13182 bp 
ORF 41 13222 

Allele GB:NM_005529_2 229 2155 

source isSNP SNP00054627 

consequence GB :NM_005529_2 . 1 

Allele . 6B:NM_005529_2 229 • 2340 

source isSNP SNPOOO 54628 

consequence GB:NM_005529_2 . 1 

Allele GB:NiyL.0d5529_2 229 3603 

source isSNP SNP00109135 

consequence GB:NH_005529_2 . 1 

Allele GB:NM_005529_2 229 3734 

. source isSNP SNP00109136 

consequence GB:NM_005529_2.1 

Allele GB:N»L_005529_2 229 3943 

source isSNP SNPOOO 546 2 9 

consequence GB :NM_005529_2 . 1 

Allele GB:NM_005529_2 229 4032 

source isSNP SNP00054630 

consequence GB :NM_005529_2 . 1 

Allele GB:NM_005529_2 229 4554 

source isSNP SNP00109138 

consequence GB:]S1M;_005529_2.1 

Allele ' GB:NM_005529_2 229 7042 

source isSNP SNP00048871 

consequence GB :NM_005529_2 . 1 

Allele GB:NM_005529_2 229 7503 

source isSNP SNP00109139 

consequence GB :NM_005529_2 . 1 

Allele GB:NM_005529_2 229 9548 

source isSNP SNP00109140 

consequence GB :NM_005529_2 . 1 

Allele GB:NM_005529_2 229 10294 

source isSNP SNP00109141 



13793 
#230 



230 
2340 



230 
3603 



Silent 
A>G 



Mlssense 
A>G 



230 
3734 



230 
3943 



230 
4032 



230 
4554 



230 
7042 



230 
7503 



230 
9548 



230 
10294 



Mlssense 
A>G 



Mlssense 
A>G 



Silent 
A>G 



Mlssense 
A>G 



Mlssense 
A>G 



Silent 
A>G 



Mlssense 
A>G 



Mlssense 
A>G 



705-705 A 



767-767 S>N 



1188-1188 R>Q 



1232-1232 G>S 



1301-1301 V 



1331-1331 G>D 



1505-1505 V>A 



2334-2334 



2488-2488 S>L 



3170-3170 T>A 



3418-3418 S 
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230 Silent 
10941 A>G 



230 Missense 
11233 G>T 



GB:NM_005529_2 229 10663 10563 A>G 

source isSNP SNP00109142 

consequence GB :NM_005529_2 . 1 
GB:MM_005529_2 229 10941 

source isSNP SNP00109143 

consequence GB:NM_005529_2 - 1 
GB:NM_005529_2 229 11233 

source isSNP SNP00009830 

consequence GB :NM_005529_2 . 1 230 Sil. 
QB:NM_005529_2 229 12358 12358 A>G 

source isSNP SNP00009831 

consequence GB:NM_005529_2 . 1 230 Sil< 
GB:NKL.005529_2 229 12604 12604 A>G 

isSNP SNP00038416 
GB:NM_005529_2.1 230 Sil. 



consequence 
GIF HSPG2-cdna-fwd.gif 



3541-3541 



3634-3634 



4188-4188 



IBSP 

Full name : IBSP 
Link : IBSP_link_cdna 

Subsequence GB :HtIMSIALO 1 1037 

CDS GBiHUMSIALO.l 954 bp 

ORF 72 1025 
Allele QB:HUMSIALO 231 494 494 

source isSNP SNP00065793 

consequence GB:HUMSIAL0.1 
Allele OBrHUMSIALO 231 655 655 

source isSNP SNP00065794 

consequence QB : HUMSIALO . 1 
Allele GB: HUMSIALO 231 709 709 

source isSNP SNP00018906 

cons equence GB : HUMS I ALO . 1 
GIF IBSP-cdna-fwd.gif 
Link : IBSP_linkL_genomic 
Subsequence 
Subsequence 
Subsequence 
Subsequence 
Subsequence 
CDS IBSP_cds.l 
exon 2863 
exon 
exon 
exon 
exon 
exon 
Allele 



#231 
#232 



232 
A>G 



232 
A>G 



GB:HUMBNSP0i 


1 


2415 


#233 


GS:HUMBNSP02 


2516 


3359 


#234 


QB:HUMBNSP03 


3460 


5094 


#235 


GB:HUMBNSP04 


5195 


9497 


#236 


•IBSP_cds . 1 


2863 7195 


#237 




954 bp 


6 exons 


#237 





3009 
3158 
3571 
5882 
6647 



2916 
3059 
3235 
3633 
6040 
7195 

236 1631 
isSNP SNP00065794 
consequence IBSP_cds.l 237 
Allele GB:HUMBNSP04 236 1685 

source isSNP SNP00018906 

consequence IBSP_cds.l 237 
GIF IBSP-genomic-fwd.giE 



GBtHUMBNSP04 
source : 



Missense 
1685 A>G 



BNSDOCID: <WO_ 



3 030S41 e8A2J_> 



wo 03/054166 



PCT/US02/41225 



TABLE 1 (Cont.) 



IER3 

Full name : iratoediate early response 3 
Link : IER3 Jink_cdna 

. Subsequence GB:yi4551_l 1 1230 #238 

CDS QB:Y14551_1.1 471 bp #239 

ORF 12 482 
Allele GB:Y14551_1 238 838 838 A>G 

source isSNP SNP00052893 

consequence GB:Y14551_1.X 239 3' 

GIF IErR3-cdna-fwd.gif 
Link : FL_758754_link_genomic 

Subsequence GB:AC006165 1 44118 #240 

Subsequence GB : AC006165_2619577CD1 14601 15183 #241 

Subsequence FL_2619577_jnima_build. 1 14585 15920 #242 

iriRNA FL_2619577_itamaj3uild.l 1224 bp 2 exons #242 

exon 14585 14810 
axon 14923 15920 
CDS GB:AC006165_2519577CD1 471 bp 2 exons . #241 

exon 14601 14810 
exon 14923 15183 
Allele GB:AC006165 240 15539 15539 A>G 

source isSNP SNP00052893 

consequence GB: AC006165_2619577CD1 241 3' 
GIF IER3-genomic-fwd.gif 



IHH 

Full name : IHH 
Link : IHH^link_cdna 



Subsequence 


GBrHUMIHH 1 


1277 


#243 






CDS GB:HUMIHH.2 


939 bp 


#244 








ORF 2 


940 










Allele GB:HUM 


IHH 243 457 


457 


A>G 






source 


isSNP SNP00097225 








consequence GB:HUMIHH.2 


244 


Silent 


152-152 


GIF IHH-cdna-fwd 


.gif 










Link : IHH_link_genoTOic 










Subsequence 


IHH_cds.l 1 


1469 


#245 






Subsequence 


GB:AB010092_1 


1 


315 


#246 




Subsequence 


.GB:AB018075_1 


416 


698 


#247 




Subsequence ■ 


GB:AB018076_1 


799 


1481 


#248 




CDS IHH_cds . 1 


1236 bp 3 exons 


#245 






exon 1 


315 










exon 426 


687 










exon 811 


1469 










Allele GB:AB018075_1 247 


194 


194 


A>G 




source wetSNP 


GB:AB018075_ 


1.V194 


.G>A 


consequence IHH_cds . 1 


245 


Mis sen 


ise 


167-167 


Allele GB:AB018076_1 248 


188 


188 


A>G 





source isSNP SNP00097225 

consequence IHH_cds.l 245 Silent 
GIF IHH-genomic-fwd.gif . 
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INHBA 

Full name : inhibin, beta A 
Link : FL_3526170_link_cdna 

Subsequence FN: 3526170CB1 1 1620 #249 

CDS FN:3526170CB1.1 1281 bp #250 

ORF 216 1496 
Allele FN:3526170CB1 249 607 607 Q>T 

source isSNP SNP00068777 

consequence FN:3526170CB1 . 1 250 Missense 
GIF INHBA- cdna-fwd.gif 
Link : FL_3526170_link_genomic 

Subsequence GB:AC005027 1 199878 . #251 

Subsequence GB: AC005027_3526170CDl 16865 54957 #252 

Subsequence FL_3526170_mrna^build . 1 14163 55081 #253 

itiRNA FL_3526170_in3ma_build.l 1620 bp . 3 exons 

exon 14163 14234 
exon 16722 17252 
exon 54065 55081 
CDS GB:AC005027_3526170CDl 1281 bp 2 exons #252 

exon 16865 17252 
exon 54065 54957 
Allele GB:AC005027 251 16377 16377 A>G 

source dbSNP gnl ] dbSNP | ss577365_aliele 

source dbSNP gnl j dbSNP j ss588511_allele 

consequence GB: AC005027_3526170CD1 252 ' 5' 
GIF INHBA-genomic-fwd.gif . 



IRSl 

Full name : Insulin receptor substrate 1 
Link : IRSl_link_cdna 

Subsequence EM:S62539 1 5828 

CDS EM:S62539.1 3729 bp #255 
ORF 1021 4749 
EM:S62539 



Allele 



consequence 

Allele EM:S62539 
source 
consequence 

Allele EM:S62539 
source 
consequence 

GIF IRSl-cdna-fwd.gif 
Link : IRSl_link_genomic 

Subsequence 

Subsequence 

Subsequence 

CDS IRSl_cds.l 
exon 680 

iriRNA 

exon 



254 3388 3388 
isSNP SNP00067005 
EM:S62539.1 255 
254 3887 3887 
isSNP SNP00114530 
EM:S62539.1 255 
254 5156 5156 
isSNP SNPb0067006 
EM:S62539.1 255 



EM:S85963 100 6251 
IRSl_cds.l 680 4411 
IRSl_jnma_build.l 100 
3732 bp 1 exon 

4411 

IRSl_mma_build.l - 4333 bp 
100 4432 



Missense 
G>T 



#256 
#257 

4432 #258 
#257 



210 
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consequence 

EM:S85963 

source 

consequence 

SM:S85963 

soui?ce 

consequence 

EM:S859e3 

source 

consecpience 

EM:S85963 ■ 

source 

consequence 

EM:S85963 

source 

consequence 

EM:S85963 

source 

consequence 

EM:S85963 



consequence 

EM:S85963 

source 

consequence 

EM:S85963 



consequence 

EM:S85963 

source 

consec[uence 

EM:S85963 

source 

consequence 

Eayi:S85963 

source 

consequence 

EM:S85963 



256 850 
wetSNP 
IRSl_cds . 1 
256 1285 
wetSNP 
IRSl_cds . 1 
256 1783 
wetSNP 
IRSl_cds . 1 
256 2023 
wetSNP 
IRSl_cds . 1 
256 2117 
wetSNP 
IRSl_cds.l 
256 2697 
wetSNP 
IRSl_cds . 1 
256 2941 
wetSNP 
IRSl_cds . 1 
256 2951 
isSNP SNP00067005 
IRSl_cds.l 257 
256 2995 
wetSNP 
IRSl_cds . 1 
256 3035 
wetSNP 
IRSl_cds.l 
256 3262 
wetSNP 
IRSl_cds.l 
256 3349 
wetSNP 
IRSl_cds . 1 
256 3450 



850 A>G 

EM:S85963 .v850.C>T 
257 Silent 90-90 D 

1285 A>Q 

EM:S85963 .vl285 .G>A 
257 Silent 235-235 
1783 A>G 

EM:S85963 .vl783 .T>C 
257 Silent 401-401 
2023 A>G 

EM:S85963.v2023.C>T 
257 Silent 481-481 
2117 OG 

EM:S85963 .v2117 .Q>C 
257 Missense 513-513 
2697 A>G 

EM:S85963.v2697.G>A 
257 Missense 706-706 
2941 A>G 

EM:S85963 .V2941 .T>C 
257 Silent 787-787 
2951 A>G 



consequence 
EM:S85963 
source 
consequence 



Missense 791-791 
2995 A>G 

EM:S85963 .v2995 .A>G 
257 Silent 805-805 
3035 OG 

EM:S85963 .v3035.G>C 
257 Missense 819-819 
3262 OG 

EM:S85963.v3262 .G>C 
257 Silent 894-894 
3349 A>Q 

EM:S85963 ,v3349 .G>A 
257 Silent 923-923 
3450 A>Q 
isSNP SNP00114530 

IRSl_cds.l 257 , Missense 957-957 

256 3494 3494 A>G 

wetSNP EM:S85963.v3494.G>A 

IRSl_cds.l 257 Missense 972-972 

256 4053 4053 A>G 

wetSNP EM:S85963 .V4053 ,G>A 

IRSl_cds.l 257 Missense 1158-1158 



GIF IRSl-genomic-fwd.gif 



Full name : v-jun avian sarcoma virus 17 oncogene homolog 

Link ; JUN_link_genomic 

Subsequence JUN_cds.l 9468 8473 #259 

Subsequence OBs AL136985_1 1 151212 #260 

Subsequence. JUN_jinma_build. 1 ^-^^ 8473 #261 
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CDS JUN_cds.l 996 bp 1 exon #259 

exon 9468 8473 
inRNA " atlN_itvma_build.l 996 bp 1 exon #261 

exon 9468 8473 
GIF JUN-genomic-rev.gif 



KJ_OAll 

Full name : KIAA1253 

Link : FL_2135776_link_cdna 

Subsequence FN: 2135776CB1 1 

CDS FN:2135776CB1.1 1197 bp 

ORF 256 1452 
Allele FN:2135776CB1 262 59 

source isSNP SNP00100733 

consequence FN:2135776CB1 .1 
Allele FN:2135776CB1 262 1352 

source isSNP SNP00116557 

consequence FN: 2i35776CBl . 1 
Allele FN:2135776CB1 262 1477 

source isSNP SNP00042286 

consequence FN: 213577 6CB1 . 1 
Allele FN:2135776CB1 262 1489 

source isSNP SNP00042287- 

consequence FN: 2135776CB1 . 1 
Allele FN:2135776CB1 262 1667 

source isSNP SNP00011480 

consequence FN: 2135776CB1 . 1 
Allele FN:2135776CB1 262 1710 

source isSNP SNP00011481 

consequence FN: 2135776CB1 . 1 
Allele FN:2135776CB1 262 1838 

source isSNP SNP00011482 

consequence FN:2135776CB1.1 
Allele FN:2135776CB1 262 2589 

source isSNP SNP00003671 

consequence FN: 2135776CB1 . 1 
GIF KJ_OAll-cdna-fwd.gif 
Link : FL_2135776_link_senomic 

Subsequence GB:HS425C14 1 160203 

Subsequence GB :HS425C14_2135776CD1 

Subsequence FL_2135776_inma_build. 1 

Subsequence KJ_OAll_cds . 1 ' 55766 

CDS GB:HS425C14_2135776CD1 1197 bp 



Missense 
A>G 



3129 
#263 



263 
1352 



263 
1477 



263 
1489 



263 
1667 



263 
1710 



263 
1838 



263 
2589 



#264 

55766 42255 #265 
69012 40562 #266 
51052 #267 
9 exons #265 



exon 


55766 


55731 


exon 


53861 


53692 


exon 


51441 


51362 


exon 


51118 


50981 


exon 


49268 


49099 


exon 


48965 


48875 


exon 


44476 


44332 


exon 


44215 


43985 


exon 


42390 


42255 



FL_213 577 6_jmma_build . 1 
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exon 


69012 


68910 


exon 


55892 


55731 


exon 


53861 


53692 


exon 


51441 


51362 


exon 


51118 


50981 


exon 


49268 


49099 


exon 


48965 


48875 


exon 


44476 


44332 


exon 


44215 


43985 


exon 


42390 


40562 


Ka_OAll_cds . 1 


exon 


55766 


55731 


exon 


53861 


53692 


exon 


51118 


51052 



GB:HS425C14 

source 

consequence 

consequence 

GB:HS425C14 

source 

consequence 

consequence 

GB:HS425C14 

source 

consequence 

consequence 

GB:HS425C14 

source 

consequence 

consequence 

QB:HS425C14 

source 

consequence 

consequence 

GB:HS425C14 

source 

source 

consequence 

consequence 

GB:HS425C14 

source 



consec[uence 



264 41092 41092 A>G 
isSNP SNP00003671 
GB:HS425C14_2135776CD1 
KJ_OAll_cds .1 267 
264 41843 41843 A>G 
isSNP SNP00011482 
GB:HS425C14_2135776CD1 
KJ_OAll_cds .1 267 
264 41971 41971 A>G 
isSNP SNP00011481 
GB :HS425C14_2135776CD1 
KJ_OAl l_cds .1 267 
264 42014 42014 A>G 
isSNP SNP00011480 
GB:HS425C14_2135776CD1 
KJ_OAll_cds . 1 267 
264 42192 42192 A>a 
isSNP SNP00042287 ' 
GB!HS425C14_2135776CD1 
KJ_OAll_cds . 1 267 3' 

264 42204 42204 A>a 
isSNP SNP00042286 

wetSNP GB:HS425C14.y42204.G>A 

GB5HS425C14_2i35776CDl 265 3' 

KJ_OAll_cds.l 267 3' 

264 42294 42294 OG 

wetSNP GB:HS425Cl4.v42294.G>C 

wetSNP GB:HS425C14.v42294,Q>C 

GB:HS425C14_2135776C331 265 Silent 



265 



265 
3' 



265 
3' 



265 



265 



S>G 
Allele 



consequence 
GB:HS425C14 



KJ_OAll_cds . 1 267 
264 42329 42329 A>G 
isSNP SNP00116557 
consequence GB:HS425C14_2135776CD1 



265 



consequence KJ_OAll_cds . 1 267 3' 

GB:HS425C14 264 44297 44297 A>G 

source wetSNP GB :HS425Cl4 . v44297 .T>C 

consequence GB:HS425C14_2135776CI)1 265 . Intron 

consequence KiJ_OAll_cds . 1 267 3' 

aB:HS425C14 264 55597 55697 A>G 
213 



_03054166A2J; 



wo 03/054166 
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TABLE 1 (Cont.) 

source wetSNP GB:HS425C14 .v55697 .G>T 

consequence QB :HS425C14_2135775CD1 265 Intxon 

consequence KJ_OAll_cds . 1 267 Intron 

Allele GB:HS425C14 264 68954 68954 C>G 

source isSNP SINP00100733 

consequence QB:HS425C14_2135776CD1 265 5' 

consequence KJ_OAll_cds.l 267 5' 

GIF KJ_OAll-genomic-rev.gif 



KJ_OA2 

Link : KJ_OA2_link_cdna 

Subsequence LG: 244552. 16 1 1825 #268 

Allele LG:244552.16 268 1476 1476 G>T 

source isSNP SNP00098862 



KJ_OA21 

Full name : FL project 2027624 
Link : FL_2027624_link_Gdna 



Subsequence FN:2027624CB1 1 


2173 


#269 








C3DS FN: 


2027624CB1.1 1734 bp 


#270 










ORF 


4 1737 












Allele 


FN:2027624CB1 269 881 
source isSSTP SNP00106459 


881 


OQ 










consequence FN: 2027624CB1 . 1 


270 


Missense 


293- 


293 


T>R 


Allele 


FN:2027624CB1 269 971 
source isSNP SNP00075286 


971 


A>G 










consequence FN: 2027624CB1 . 1 


270 


14issense 


323- 


-323 


T>I 


Allele 


FN:2027624CB1 269 1092 
source isSNP SNP00106460 


1092 


C>Q 










consequence FN:2027624GB1.1 


270 


Silent 


363- 


-363 


L 


Allele 


FN:2027624CB1 269 1254 
source isSNP SNP00075287 


1254 


.A>G 










consequence FN: 2027624CB1 . 1 


270 


Silent 


417- 


-417 


Q 


Allele 


FN:2027624CB1 269 1374 
source isSNP SNP00009699 


1374 


A>G 










consequence FN: 2027624CB1 . 1 


270 


Silent 


457- 


-457 


T 


Allele 


FN:2027624CB1 269 1392 
source isSNP SNP00097916 


1392 


A>G 










consequence FN:2027 524CB1.1 


270 


Silent 


463- 


-463 


G 


Allele 


FN:2027624CB1 269 1623 
source isSNP SNP00009700 


1623 


A>G 










consequence FN: 2027624CB1 . 1 


270 


Silent 


540 


-540 


Y 



GIF KJ_OA21-cdna-fwd.gif 
Link : FL_1250708_link_genoinic 
Subsequence 
Subsequence 
Subsequence 
Subsequence 
Subsequence 
Subsequence 



GB:HS453C12 1 147620 
GB:HS453C12_1394592CD1 87967 
GB:HS453C12_2027624CX)1 20194 
FL_1394592_nima_build.l 87945 
FL_2027624_pima_build.l 20197 
OA21_cds.l 20194 17050 #276 
FL_2027624_jiima_build.l bp 



#271 

109084 #272 
10528 #273 
110578 #274 
6152 #275 



13 exons 



#275 
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TABLE 1 (Cont.) 



exon 


20197 


20008 


exon 


19834 


19657 


exon 


17499 


17372 


exon 


17056 


16956 


exon 


16847 


16761 


exon 


16215 


16128 


exon 


16019 


15922 


exon 


15823 


15658 


exon 


14968 


14768 


exon 


12135 


11970 


exon 


11855 


11772 


exon 


10777 


10110 


exon 


6168 


6152 


OA21_ 


.cds . 1 


372 bp 


exon 


20194 


20008 


exon 


19834 


19657 


exon 


17056 


17050 


GB:HS453C12_ 


.202762. 


exon 


20194 


20008 


exon 


19834 


19657 


exon 


17499 


17372 


exon 


17056 


16956 


exon 


16847 


16761 


exon 


16215 


16128 


exon 


16019 


15922 


exon 


15823 


15658 


exon 


149S8 


14768 


exon 


12135 


11970 


exon 


11855 


11772 


exon 


10777 


10528 



qB:HS453Cl2 271 10642 10642 A>Q 

source ^ isSNP SNP00009700 

source' wetSNP GB :HS453C12 . vl0642 , 

source wetSNP GB:HS453C12 .vl0642 , 

consequence OA21_cds.l 276 3' 

consequence QB :HS453C12_2027624CDl 273 



A>G 
A>6 



GB:HS453C12 271 11206 11206 A>G 

source dbSNP gnl | dbSNP | ss979258_allele 

consequence OA21_cds,l 276 3' 

consequence GB :HS453C12_2027624CD1 273 Intrc 

GB:HS453C12 271 11999 11999 A>G 

source isSNP SNP00009699 

source wetSNP GB:HS453C12 .vll999 .OT 

source wetSNP GB :HS453C12 . vll999 . C>T 

consequence OA21_cds.l 276 3' 

consequence GB :HS453C12_2027624CD1 273 Silei 



GB:HS453C12 271 13494 13494 A>G 

source isSNP SNP00095042 

consequence OA21_cds.l 276 3' 

consequence GB : HS453C12_2027624CD1 

GB:HS453C12 271 14913 14913 OG 

source isSNP SNP00106460 

consequence OA21_cds.l 276 3' 
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TABLE 1 (Cont.) 

consequence GB :HS453C12_2027624CDl 273 Silent 

GB:HS453C12 271 15723 15723 A>G 

source isSNP SNP00075286 

consequence OA21_cds.l 276 3' 

consequence GB :HS453C12_2027624CD1 273 Missens 



GIF KJ_OA21-genomic-rev.gif 



KJ_OA29 

Link : KJ_OA29_link_cdna 
Subsequence IjG:19S 
Allele 1,6:199489.1 

source 
Allele L6: 199489.1 

source 
Allele LG:199489.1 

source 
Allele LG: 199489.1 

source 
Allele LG: 199489.1 

•source 
Allele LG: 199489.1 

source 
Allele LG: 199489.1 

source 
Allele LG: 199489.1 

source 
Allele LG: 199489.1 

source 
Allele LG: 199489.1 

source 



489.1 1 3318 #277 

277 544 544 A>Q 

isSNP SNP00005297 

277 695 695 A>G 

isSNP SNP00121995 

277 971 . 971 A>G 

isSNP SNP00047679 

277 1312 1312 A>G 

isSNP SNP00005298 

277 1445 1445 A>Q 

isSNP SNP00027647 

277 2370 2370 A>G 

isSNP SNP00005297 

277 2521 2521 A>G 

isSNP SNP00121995 

277 2797 2797 A>Q 

isSNP SNP00047679 

277 3138 3138 A>G 

isSNP SNP00005298 

277 3271 3271 A>G 

isSNP SNP00027647 



Ka_OA3 

Link : KJ_OA3_link_cdna 

Subsequence LG: 153511.1 1 1628 #278 

Allele LG: 153511.1 278 395 395 A>Q 

source isSNP SNP00003503 

Allele LG: 153511.1 278 1101 1101 A>G 

source isSNP SNP00113687 



KJ_OA31 

Link : KJ_OA31_link_cdna 

Subsequence LG: 200972. 2 1 2192 #279 

Allele LG: 200972. 2 279 366 366 C>G 

source isSNP SNP00099556 

Allele LG: 200972. 2 279 836 836 A>G 

source isSNP SNP00015954 

Allele LG: 200972. 2 279 1037 1037 A>G 
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source 




isSNP 


SNP00015955 




Allele 


LG: 200972 


2 


279 


1361 1361 


A>G 




source 




isSNP 


SNP00000598 




Allele 


LG:200972 


2 


279 


1697 1697 


A>G 




source 




isSNP 


SNP00000599 




Allele 


LG:20G972 


2 


279 


1975 1975 


A>G 




source 




isSItP 


SNP00067907 




Allele 


LG:200972 


2 


279 


2027 2027 


A>G 




source 




IsSXIF 


SNP00067908 





KJ_OA33 

Full name : cardiotrophin-like cytokine 

Link : FL_1676240_link_genomic 

Subsequence GB : AC005849_1 1 169144 #280 

Subsequence KJ_OA33_cds . 1 151862 143455 #281 

Subsequence KJ_OA33_mma_build. 1 151907 142489 #282 



678 bp 
151847 

145945 145779 
143949 143455 
KJ_OA3 3_inrna_bui Id . 3 
151907 151847 
145945 145779 
143949 142489 



CDS KJ_OA33_cds.l 
exon 151862 
exon 
exon 

mRNA 

exon 
exon 
exon 

GIF KJ_OA33-genoinic-rev.gif 



#281 



KJ_OA39 

Link : KJ_OA39_link_cdna 

Subsequence LG: 2939 53.1 1 940 . #283 

Allele LQ:293953.1 283 679 679 G>T 

source isSNP SNP00110603 



KJ_OA6 

Full name : FL project 2840746 
Link : FL_818498_linK_genoinic 

Subsequence GB:AC005598 1 190000 #284. 

Subsequence GB: AC005598_2840746CD1 132700 

Subsequence FL_2840746_inma_build. 1 132672 

CDS aB:AC005598_2840746CDl 669 bp 1 exon 

exon 132700 133368 

FL_2840746_inma_build.l 1087 bp 2 exo 

132672 133391 
135218 135584 

GB:AC005598 284 132689 132689 
source isSNP SNP00005520 

consecjuence GB : AC005598_2B40746CDl' 285 5' 
GB:AC005598 284 132843 132843 A>G 

source wetSNP GB: AC00559B .vl32843 . OT 

consequence QB:AC005598_2840746CD1 285 Silent 
217 



mRNA 

exon 
exon 
Allele 



Allele 



133368 
135584 
#285 



A>G 



#285 
#286 
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GB:AC00559 8 



G>V 
Allele 



consequence 

GB:AC005598 

source 

consequence 

GB:AC005598 

source 

cons equence 

GB:AC005598 

source 

consequence 



B:AC005598 
source 
consequence 
OIF KJ_OA6-genomic-fwd, 



•284 132878 132878 A>G 

wetSNP GB:AC005598.vl32878.G>A 
GB:AC005598_2840746CD1 285 Missense 
284 132951 132951 A>G 

wetSNP GB:AC005598.vl32951.C>T 
GB:AC005598_2840746CD1 285 Silent 
284 132967 132967 A>Q 

wetSNP GB:AC005598.vl32967.C>T 
GB:AC005598_2840746CD1 285 Missense 
284 133103 133103 G>T 

wetSNP GB:AC00559B.vl33103.G>T 
GB:AC005598_2840746Cr)l 285 Missense 

284 133481 - 133481 A>G 

wetSNP GB:AC005598.vl33481.C>T 
QB:AC005598_2840746CD1 285 3' 
.gif 



60-60 R>H 



90-90 P>S 



KJ_oagba3 

Link : KJ^oagba3_link_cdna 
Subsequence LQ:215642.2 



LG:215642.2 
source 
LG:215642.2 
source 



2849 

287 1475 1475 
isSNP SNP00041601 
287 1963 1963 
isSNP SNP00010951 



#287 
A>G 



LIF 

Full name : leukemia inhibitory 
Link : LIF_link_cdna 

Subsequence GB : LIF 

CDS GB:LIF,1 609 bp 

ORP 45 653 

Allele GBiLIF 288 

source i sSNP 

consequence GB:LIF 

Allele GB:LIF 288 

source isSNP 
consequence QB:LIF 

Allele GB:LIF 288 

source isSNP 
consequence GB : LIF 

Allele GB:LIF 288 

source 

consequence GB : LIF 



GB-.LIF 

source 

consequence 

GB : LIF 

source 

consequence 

GBiLIF 



288 
isSNP 
GB : LIF 
288 
isSNP 
GB : LIF 
288 



1 3848 
#289 

1183 1183 
SNP00036337 

289 
1572 1572 
SNP00099092 

1 289 
1996 1996 
SNP00099093 
.1 289 
2062 2062 
SNP00099094 

289 
2404 ^ 2404 
SNPOob99 095 

1 289 
3156 3156 
SNP00036338 

289 
3582 3582 
218 



3' 
A>G 



3' 
G>T 



3' 
A>G 



3' 

A>a 
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source isSNP SNP00008778 

consequence GB:LIF.l 289 3' 

GIF LIF-cdna-fwd.gif 
Link : OSM_link_genoinic 

Subsequence GB:AC004264 1 47188 #290 

Subsequence LIF_cds.l 11398 8354 #291 

Subsequence LIF_mrna_build. 1 11442 5156 #292 

CDS LIF_cds.l 609 bp 3 exons #291 



exon 


11398 11380 






exon 


9636 9458 






exon 


8764 8354 






mRNA 


LIF_inma_build.l 3851 bp 


3 exons 


exon 


11442 11380 






exon 


9636 9458 






exon 


8764 5156 






Allele 


GB:AC004264 


290 5420 5420 


A>G 




source 


isSNP SNP00008778 






consequence 


LIP_cds.l 291 


3' 


Allele 


GB:AC004264 


290 5846 5846 


A>G 




source 


isSNP SNP00036338 






consequence 


LIF_cds.l 291 


3' 


Allele ■ 


GBJAC004264 


290 6598 6598 


A>G 




source 


isSNP SNP00099095 






consequence 


LIF_cds.l 291 


3 ' 


Allele 


. GB:AC004264 


290 6940 6940 


G>T 




source 


isSNP SNP00099094 






consequence 


LIF_cds.l 291 


3 ' 


Allele 


GB:AC004264 


290 7006 7006 


OG 




source 


isSNP SNP00099093 






consequence 


LIF_cds.l 291 


3 ' 


Allele 


GB:AC004264 


290. 7435 7435 


A>G 




source 


isSNP SNP00099092 






consequence 


LIF_cds.l 291 


3' 


Allele 


GB: ACQ 042 64 


290 7824 7824 


G>T 




source 


isSNP SNP00036337 






consequence 


LIF_cds,l 291 


3* 



GIF LIF-genomic-rev.gif 



LUM 

Full name : lumican 
Link : FL_2676170_link_genomic 
Subsequence GB : AC007115_1 



180821 



Subsequence GB : AC007115_1_3128106CD1 

Subsequence FL_3128106_jmma_build. 1 . 84719 

mRNA FL_3128106_mma_build.l 1926 bp 

exon 84719 84998 
exon 87396 88278 
exon 92077 92839 • 
CDS GB:AC007115_1_3128106CD1 1020 bp 

exon 87417 88278 
exon 92077 92234 
Allele GB:AC007115_1 293 89050 89050 A>G 

source dbSNP gnl | dbS|rE| ss852530_allele 



87417 92234 #294 
92839 #295 
3 exons #295 



exons 
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source dbSNP gnl | dbSNP | ss897 123_allele . 

consequence GB : AC007115_1_3128106CD1 294 
Allele GB:AC007115_1 293 89249 89249 A>G 

source dbSNP gnl | dbSNP | ss855039_allele 

consequence GB : AC007115_l_3128106CDl 294 
GIF LXJM-genomic-fwd.gif 



METTLl 

Full name : methyltransferase-like 1 
Link : METTLl_link_cdna 

Subsequence CTsY18643_l 1 1292 

CDS GB:Y18643_-1.1 831 bp 

ORF 49 879 
Allele GB:yi8643_l 296 345 345 

source isSNP SNP00098761 

consequence GB:Y18643_1.1 
Allele GB:Y18643_1 296 919 919 

source isSNP SNP00003825 

consequence GB : Y18643_l . 1 
GIF METTLl-cdna-fwd.gif 



#296 
#297 



297 
A>G 



MMPl 

Full name : matrix metalloproteinase 1 
Link : MMP1_1 ink_cdna 

Subsequence EMiHSCOLLl 1 1970 



Allele 


EM:HSCOLLl 


298 


383 383 


A>G 




source 


isSNP 


SNP00009627 




Allele 


EMrHSCOLLl. 


298 


714 714 


A>Q 




source 


isSNP 


SNP00037857 




Allele 


EMrHSCOLLl 


298 


745 745 


A>Q 




source • 


isSNP 


SNP00037858 




Allele- 


EMiHSCOLLl 


298 


1522 1522 


A>G 






isSNP 


SNP00009628 




Allele 


EMrHSCOLLl 


298 


1541 1541 


A>G 




source 


isSNP 


SNP00009629 




Allele 


EMrHSCOLLl 


298 


1662 1662 


A>G 




source 


isSNP 


SNP00009630 




Allele 


EMrHSCOLLl 


298 


1747 1747 


A>G 




source 


isSNP 


SNP00009631 





Link r MMPl_link_genoraic 

Subsequence GB:HSU78045 i 81826 #299 

Subsequence MMPl_cds.l 11905 4225 #300 

Subsequence MMPl_mma_build. 1 11973 3733 #301 

CDS MMPl_cds.l 1410 bp 10 exons #300 



exon 


11905 


11801 


exon 


11314 


11070 


exon 


10975 


10828 


exon 


10603 


10478 


exon 


9421 


9266 


exon 


9105 


8988 


exon 


6551 


6418 
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exon 
exon 



5308 5146 
4619 4516 
4334- 4225 
MMPl_mma_build . 1 





11973 


11801 


exon 


11314 


11070 


exon 


10976 


. 10828 


exon 


10603 


10478 


exon 


9421 


9266 


exon 


9105 


8988 


exon 


6551 


6418 


exon 


5308 


5146 


exon 


4619 


4516 


exon 


4334 


3733 



GB:HSU78045 299 3956 3956 
source isSNP SNP00009631 

consequence MMPl_cds.l 300 
QB:HSU78045 299 4041 4041 
source isSNP SNP00009630 

consequence MMPl_cds . 1 300 
GB:HSU78045 299 4162 4162 
source isSNP SNP00009629 

consequence MMPl_cds . 1 300 
GB:HSU78045 299 4181 4181 
source isSNP SNP00009 628 

consequence MMPl_cds.l 300 
GB:HSU78045 299 4517 4517 
source wetSNP 
consequence MMPl_cds . 1 
GB:HSU78045 299 4661 
source ' wetSNP 

consequence MMPl_cds . 1 
GB:HSU78045 299 4677 
source wetSNP 
consequence MMPl_cds . 1 
QB:HSU78045 299 5198 
source wetSNP 
consequence MMPl_cds . 1 
QB:HSU78045 299 6586 
source wetSNP 
consequence MMPl_cds . 1 
GB:HSU78045 299. 9056 
source wetSNP 
consequence MMPl_cds . 1 
QB:HSXn8045 299 9120 
source wetSNP 
consequence MMPl_cds . 1 
GB:HSU78045 299 9126 
source wetSNP 
consequence MMPl_cds . 1 
GB:HSU78045 299 9205 
source wetSNP 
consequence MMPl_cds . 1 
(3B:HSU78045 299 9247 
source wetSNP 



A>G 

GB:HSU78045.v4517 .A>G 
300 Silent 433-433 
4664 CATG>CG 
GB : HSU78045 . v4661 . CATG>CG 
300 Intron 
4677 A>a 

GB:HSU78045.v4677.G>A 
300 Intron 
5198 A>G 

QB:HSU78045.v5198.A>G 
300 Missense 382-382 
6586 A>G 

GB :HSU78045 .v6586 .T>C 
300 Intron 
9056 A>G 

GB:HSU78045.v9056.C>T 
300 Silent 277-277 
9120 A>Q 

GB:HSU78045.v9120.A>G 
300 Intron 
9126 A>G 

GB:HSU78045-v9126.G>A 
300 Intron 
9205 A>G 

GB:HSU78045.v9205.T>C 
300 Intron 
9247 A>G 

GB :HSU78Q45 .v9247 .T>C 
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Allele 



consequence 
GB:HSU7 8045 
source 
consequence 
GB:HSU78045 



300 Intron 
93 65 G>T 

GB:HSU78045.v9365.G>T 



300 
9370 



Missense 
A>G 



source 
consequence 
Allele GB:HSU78045 
source 
source 

consequence MMPl_cds . 1 
GIF MMPl-genoniic-rev.gif 



MMPl_cds . 1 
299 9365 
wetSNP 
MMPl_cds.l 
299 9370 
isSNP SNP00037858 
MMPl_cds.l 300 Missense 
299 11105 11105 A>G 
isSNP SNP00009627 

wetSNP GB:HSU78045.vlll05.C>T 

300 Silent 105-105 



228-228 



226-226 



mP13 

Full name : MMP13 

Link : MMP13_link_genoinic 

Subsequence MMP13_cds.l 
Subsequence GB:AP000789_ 
CDS MMP13_cds.l 957 bp 



159614 
201766 
#302 



#302 
#303 



exon 
exon 
exon 
exon 
exon 
exon 
Allele 



141629 
141956 
144063 
146009 
147078 
157208 
159509 



141779 
142081 
144224 
146126 
147211 
157367 
159614 



GB:AP000789_1 303 
source wetSNP 
consec[uence ]yiMP13_.cds.l 
Allele GB:AP000789;^1 303 
source wetSNP 
consequence MMPl3^cds . 1 
Allele QB:AP000789_1 . 303 

source wetSNP 
consequence MMP13_cds . 1 
Allele GB:AP000789_1 303 

source wetSNP 
consequence MMP13_cds - 1 
Allele GB:APD00789_1 303 

source wetSNP 
consequence MMP13_cds.l 
Allele GB:AP000789_1 303 

source wetSNF 
consequence MMPl3_cds.l 
Allele GB:AP000789_1 303 

source wetSNP 
consequence MMPl3_cds . 1 
GIF MMP13-genoinic-fwd.gif 



141614 141614 C>G 

GB:AP000789_l.vl41614.C>Q 
302 5' 

141875 141875 G>T 

GB:AP000789_l.vl41875.C>A 
302 . Intron 

147095 147095 A>G 

QB:AP000789_l.vl47095.A>Q 
302 Missense 192-192 
157231 157231 OG 

GB:AP000789_l.vl57231.G>C 
302 Missense 239-239 
157325 157325 A>G 

GB:AP000789_l.vl.57325.A>G 
302 Missense 270-270 
159631 159631 A>G 

GB: AP000789_1 .vl59631 .OT 
302 3' 

159644 159644 OG 

GB: AP000789_1 .yl59644 .G>C 
302 3' 



MMP14 
Full n 
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Link : MMP14_link_cdna 

Subsequence ■ GB:HUMMTMMP 1 
CDS GB:HUMMTMMP.l 1749 bp 

ORF 112 1860 
Allele QB : HUMMTMMP 

source 
consequence 
Allele GB : HUMMTMMP 
source 



#304 
#305 



Allele 



Allele 



consequence 

QB:HUMMTMMP 

source 

consequence 

GB: HUMMTMMP 

source 

consequence 



source 

consequence 
Allele GB: HUMMTMMP 

source 

consequence 
Allele GB : HUMMTMMP 

source 

consequence 
GIF MMP14-cdna-fwd.gif 
Link : MMP14_link_genomic 

Subsequence MMP14_cds.l 132034 

Subsequence GB : AL133448_3 1 

Subsequence MMP14_inma_build. 1 

CDS •MMP14_cds.l 1749 bp 10 exons 



304 133 133 
isSNP SNPQ0107954 
GB : HUMMTMMP. 1 
304 580 580 
isSNP SNP00107955 
GB : HUMMTMMP. 1 
304 888 888 
isSNP SNP00093383 
GB: HUMMTMMP. 1 
304 966 966 
isSNP SNP00055171 
GB : HUMMTOIM P . 1 
304 1243 1243 
isSNP SNP00107956 
GB : HUMMTMMP. 1 
304, 1264 1264 
isSNP SNP00107957 
GB : HUMMTMMP . 1 
304 1944 1944 
isSNP SNP0006Q446 
GB: HUMMTMMP. 1 



exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 

, exon 

A 

exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 



132034 
136706 
137128 
137625 
138472 
138925 
139586 
139845 
140466 
140923 



132141 
136854 
137250 
137932 
138633 
139085 
139724 
139995 
140581 
141254 



305 
A>Q 



305 
OG 



305 
A>G 



305 
A>G 



305 
OG 



305 
A>Q 



141254 
173805 
131922 
#306 



#306 
#307 
142801 



MMP14_jmma_build.l 3408 t 

131922 132141 
136706 136854 
137128 137250 
137625 137932 
138472 138633 
138925 139085 
139586 139724 
139845 139995 
140466 140581 
140923 . 142801 
GB:AL133448_3 307 132055 

source isSNP SNP0O107954 

consequence MMP14_cds.l 306 
223 
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Allele GB:AL133448_ 
source 
consequence 

Allele GB:AL133448_ 
source 
consequence 

Allele . GB:AL133448_ 
source 
consequence 

Allele GB:AL133448_ 
source 
source 
consequence 

Allele GB:AL133448_ 
source 
consequence 

Allele GB:AL133448_ 
source 
consequence 

Allele GB:AL133448_ 
source 
consequence 

Allele GB:AL133448. 
source 
consequence 
Allele . GB:AL133448. 

source . 
consequence 
GIF MMP14-genoinic-fwd, 



3 307 137049 137051 TTA>TA 

wetSNP GB:AL133448_3.vl37049.TTA>TA 
MMPl4_cds.l 3 06 Intron 

.3 307 137713 137713 A>G 

isSNP SNP00107955 

MMP14_cds.l 306 Silent 157-157 L 

3 307 138406 138406- A>Q 

wetSNP GB:AL133448_3.vl38406.Q>A 
MMP14_Gds.l 306 Intron 

.3 307 138560 138560 OG 

isSNP SNP00093383 
wetSNP 



MMP14_cds . 1 
3 307 

wetSNP 

MMP14_cds. 1 
3 307 

wetSNP 

MMPl4_cds . 1 
.3 307 

wetSNP 

MMP14_cds . 1 
_3 307 

wetSNP 

MMP14_cds . 1 
_3 307 

isSNP SNP00060446 

MMP14_cds.l 306 
gif 



GB:AL133448_3.vl38560.C>Q 
306 Silent 259-259 
138653 138653 A>G 

GB:AL133448_3.vl38653.G>A 
306 Intron 

139639 139639 A>a 

GB: AL133448_3 .vl39639 .G>A 
306 Missense 355-355 
139981 139981 A>G 

GB:AL133448_3.vl39981.C>T 
306 Silent 429-429 
139986 139986 A>G 

GB:AL133448_3 .vl39986 .G>A 
306 Missense 431-431 
141337 141337 A>Q 



MMP2 

Link : MMP2_link_cdna 

Subsequence GB:HSMMPM2 1 3530 

CDS aB:HSMMPM2.1 2010 bp 

ORF 49 2058 

Allele GB:HSMMPM2 309 681 681 

source isSNP SNP00100004 

consequence GB:HSMMPM2.1 

Allele GB:HSMMPM2 309 1835 1835 

source isSNP SNPOOIOOOOS 

consequence GB:HSMMPM2.1 

Allele GB:HSMMPM2 309 1851 1851 

source isSNP SNP00075435 

consequence GB : HSMMPM2 . 1 

Allele GB:HSMMPM2 309 2717 2717 

source isSNP SNP00024650 

consequence GB : HSMMPM2 . 1 

Allele 6B:HSMMPM2 309 2922 2922 

source isSNP SNP00024651 

consequence GB : HSMMPM2 . 1 
GIF MMP2-cdna-fwd.gif 
Link : MMP2_lin]<L_genomic 



#309 
#310 



310 
A>G 



310 
G>T 



310 
A>G 



310 
OG 



Missense 



Subsequence 



MMP2_cds.l . 175558 
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BNSDOCID: <WO 030541 eeA2_l_> 



wo 03/054166 



PCT/US02/41225 



TABLE 1 (Cont.) 



Subsequence 
Subsequence 
CDS MMP2_cds,l 



GB:AC012182_3 1 190117 #312 

MMP2jiima_build.l 175S06 155007 
2010 bp Id exons #311 





175558 


175397 


exon 


164437 


164289 




163643 


163515 


exon 


162034 


161727 




161372 


161211 




160292 


160039 




159678 


159540 




158699 


158549 




158397 


158282 




156902 


156463 




MMP2_inma_ 


.build. 1 : 


exon 


175606 


175397 


exon 


164437 


164289 


exon 


163643 


163515 


exon 


162034 


161727 


exon 


161372 


161211 


exon 


160292 


160039 


exon 


159678 


159540 


exon 


158699 


158549 


exon 


158397 


158282 


exon 


156902 


155007 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



GB:AC012182_ 
source 



155598 



OG 



155804 



156670. 



A>G 



G>T 



312 155598 
isSNP. SNP00024651 
consequence MMP2_cds.l 311 3 
OB:AC012182_3 312 155804 

source isSNP SNP00624650 

consequence MMP2_cds.l 311 3 
aB:AC012182_3 312 156670 

source isSNP SNP00075435 

consequence MMP2_cds.l 311 Mis 
GB:AC012182_3 312 156686 

source isSNP SNP00100005 

consequence MMP2_cds.l 311 Missense 596-596 
GB:AC012182_3 312 161842 161842 A>Q 

source isSNP SNP00100004 

consequence MMP2_cds.l 311 Silent 211-211 
GB:AC012182_3 312 163660 163660 A>G 

source . wetSNP GB:AC012182_3 .vl63660 .G>A 

consequence MMP2_cds.l 311 Intron 
GIF MMP2-genoinic-rev.gif 



601-601 
156686 A>G 



MMP3 

Full name : matrix metalloproteinase 3 
Link : MMP3_link_cdna 

Subsequence ■eM:HSSTROMR 1 1801 #314 

Allele EMtHSSTROMR 314 331 331 A>G 

source isSNP SNP00011525 

Allele' EM:HSSTROMR 314 382 382 A>Q 

source isSNP SNP00113489 

Allele EM:HSSTROMR 314 713 713 A>G 

225 
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source isSNP SNP00015044 

Allele EMrHSSTROMR 314 976 976 A>G 

source isSNP SNP00054705 

Allele EM:HSSTROMR 314 1129 . 1129 A>G 

source isSNP SNPO 00 11527 

Link : MMP3_link_genomic 

Subsequence EM:HSU78045 100 81925 #315 

STobsequence MMP3_link_cds . 1 57437 50020 

Subsequence MMP3_mmaj3uild. 1 57480 49696 

CDS MMP3_link_cds . 1 14i34 bp 10. exons 



#316 
#317 
#316 



exon 


574-37 


57333 


exon 


56806 


56562 




56469 


56321 


exon 


56182 


5605? 


exon 


54487 


54323 


exon 


54146 


54002 


exon 


53137 


53004 




52604 


52445 


exon 


51295 


51192 


exon 


50120 


50020 



mRNA MMP3_mma_build. 1 1801 bp 10 exons #317 

exon 57480 57333 
exon 56806 56562 
exon 56469 56321 
exon 56182 56057 
exon 54487 54323 
exon 54146 54002 

exon 53137 53004 '. • 

exon 52604 52445 . 
exon 51295 51192 
exon 50120 49696 

Allele EM:HSU78045 315 52375 52375 A>G 

source wetSNP EM:HSU78045 , v52375 .T>C 

consequence MMP3_link_cds.l 316 Silent 400-400 

Allele EM:HSU78045 315 52411 52411 A>Q 

source wetSNP KM:HSU78045.v52411 .G>A 

consequence MMP3_1 ink_cds . 1 316 Silent 388-388 

Allele EM:HSU78045 315 52489 52489 A>G 

source wetSNP EM:HSU78045.vS2489 .G>A 

consequence MMP3_link_cds.l 316 Silent 362-362 

Allele EM:HSU78045 315 52527 52530 GAQT>GT 

source wetSNP EM:HSU78045 .v52527 .aAaT>GT 

consequence MMP3_link_cds .1 316 Intron 

Allele EM:HSU78045 315 52586 52586 A>T 

source wetSNP EM:HSU78045 .v52586 .T>A 

consequence MMP3_link_cds . 1 316 Intron 

Allele EM:HSU78045 315 53771 53771 A>T 

source wetSNP EM:HSU78045 .v53771 .T>A 

consequence MMP3_link_cds . 1 316 Intron 

Allele EM:HSU78045 315 54077 54077 OG 

source wetSNP EM:HSU78045 .v54077 .C>G 

consequence MMP3_link_cds . 1 316 Intoron 

Allele EM:HSU78045 315 54187 54187 A>G 

source wetSNP EM:HSU78045.v54187 .OT 

consequence MMP3_link_cdSj^g 316 Intron 
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EM:HSU78045 

source 

consequence 

EM:HSU78045 

source 

consequence 

EM:HSU78045 

source 

consequence 

EM:HSU78045 

source 

source 

consequence 

EM:HSU78045 

source 

consequence 



315 54402 54402 A>G 

wetSNP EM:HSU78045.v54402.C>T 

MMP3_1 ink_cds .1 316 Intron 

315 56119 56119 A>Q 

wetSNP EM:HSU78045.v56119.C>T 

MMP3_linK_cds.l 316 Intron 

315 56507 56507 OG 

wetSNP EM:HSU78045.v56507.G>C 

MMP3_link_cds . 1 316 Silent 

315 56525 56525 A>G 

isSNP SNP00D11525 

wetSNP EM:HSU78045.v56525.Q>A 
MMP3_link_cds . 1 316 Silent 
315 56680 56680 A>a 
wetSNP EM:HSU78045.v56680.C>T 
MMP3_link_cds.l 316 Missense 



45-45 E>K 



GIF MMP3-genoniic-rev.gif 



MMP9 

Full name : matrix metalloproteinase 9 
Link : MMP9_link_cdna 

Subsequence FN:522678CB1 1 

CDS FN: 522678CB1.1 2124 bp 

ORF 33 2156 

Allele FN:522678CB1 318 308 

source isSNP SNP00101082 

consequence FN: 522678CB1 . 1 . 

Allele FN:522678CB1 318 413 

source isSNP SNP00101083 

consequence FN: 522678CB1 . 1 

Allele FN:522678CB1 318 534 

source isSNP SNP00101084 

consequence FN: 522678CB1 . 1 

Allele FN:522678CaBl .318 591 

source isSNP SNP00101085 

consequence FN: 522678CB1 . 1 

Allele FN:522678CB1 318 719 

source isSNP SNP0010r086 

consequence FN:522678CB1 . 1 

Allele FN:522678CB1 318 748 

source isSNP SNP00021346 

consequence FN:522678CBl.l 

Allele FN:522678CB1 318 868 

source isSNP SNP00002987 

consequence FN: 522678CB1 . 1 
, Allele FN:522678CB1 318 1604 

source isSNP SNp6o021347 

consequence FN: 522678CB1 . 1 

Allele FN:522678CB1 318 1853 

source isSNP SNP00002988 

consequence FN: 522678CB1 . 1 

Allele FN:522678CB1 318 2159 

source isSNP SNP00062663 

227 



2348 
#319 



319 
413 



319 
534 



319 
591 



319 
719 



319 
748 



319 
868 



319 
1604 



319 
1853 



319 
2159 



Silent 
A>G 



Silent 
A>G 



Missense 168-168 
A>G 



Missense 187-187 
A>Q 



Silent 
A>G . 



Missense 239-239 
A>G 



Missense 279-279 
A>G 



Silent 
G>T 



Silent 
A>Q 
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consequence FN:522678CBl 


.1 


319 


3' 


Allele 


FN:522678CB1 318 


2302 


2302 


A>G 




source 


isSNP SNP00021348 








consequence FN:522678CB1 


.1 


319 


3' 


GIF MMP9- 


cdna-fwd.gif 








Link : MMP9. 


_1 ink_genomi c 








Subsequence 


GB:HIJMIVCOL01 


1 


764 


#320 


Subs equence 


GB:HUMIVCOL02 


865 


1117 


#321 


Subsequence 


GB:HUMIVCOL03 


1218 


1386 


#322 


Subsequence 


GB:HUMIVCOL04 


1487 


1635 


#323 


Subsequence 


QB:HUMJVCOL05 


1736 


1929 


#324 


Subsequence 


GB:HUMIVCOIip6 


2030 


2223 


#325 


Subsequence 


GB:HUMIVCOL07 


2324 


2520' 


#326 


Sub s equenc e 


GB:HUMIVCOL08 


2621 


279 6 


#327 


Subsequence 


GB:HUMIVCOL09 


2897 


3196 


#328 


Subsequen 




GBtHUMIVCOLlO 


3297 


3456 


#329 


Subsequence 


GB:HUMIVCOLll 


3557 


3727 


#330 


Subsequence 


GB:HUMIVCOL12 


3828 


3951 


#331 


Subs equence 


GB:HIMIVCOL13 


4052 


4371 


#332 


Subsequence 


MMP9_cds.l 619 


4180 


#333 




Subsequence 


MMP9_mma_build, 1 


587 


4371 


#334 


CDS MMP9_ 


cds . 1 


2124 bp 13 exons 


#333 




exon 


619 


756 








exon 


875 


1107 








exon 


1228 


1376 








exon 


1497 


1625 








exon 


1746 


1919 








exon 
exon 


2040 
2334 


2213 
2510 








exon 


2631 
2907 


2786 
3186 








exon 


3307 


3446 








exon 
exon 


3567 
3838 


3717 
3941 








exon 


4062 


4180 









MMP9_?nma_build.l 2348 bp 



exon 


587 


756 


exon 


875 


1107 


exon 


1228 


1376 


exon 


1497 


1625 


exon 


1746 


1919 


exon 


2040 


2213 


exon 


2334 


2510 


exon 


2631 


2786 


exon 


2907 


3186 


exon 


3307 


3446 


exon 


3567 


3717 


exon 


383B 


3941 


exon 


4061 


4371 



GB:HUMIVCOL01 320 677 677 A>G 

source wetSNP GB :HUMIVCOL01 .v677 .OT 

consequence MMP9_cds.l 333 Missense 20-20 A>V 

GB:HUMIVCOL02 321 148 148 A>G 

source isSNP SNP00101082 

consequence MMP9_cds . 1 333 Silent 92-92 K 



wo 03/054166 



PCT/US02/41225 



TABLE 1 (Cont.) 



GB:HUMIVCOL04 323 49 

source isSNP SNP00101085 

consequence MMP9_cds.l 333 
QB:HUMIVCOL05 . 324 48 

source isSNP SNP00101086 

consequence MMP9_cds.l 333 
GB:HUMIVCOL05 324 77 

source isSNP SNP00021346 

consequence MMP9_cds-l 333 
GB:HUMIVCOL09 328 252 ' 

source isSNP SNP00021347 

consequence MMP9_cds.l 333 
GB:HUMIVCOLll 330 81 

source isSNP SNP00002988 

333 



Missense 
48 A>G 



Kissense 
252 A>G 



Silent 

81 G>T 



consequence MMP9_cds . 1 
GB:HUMIVCOL13 332 
source wetSNP 
consequence MMP9_cds . 1 
GB:HXJMIVCOL13 332 
source wetSNP 
consequence MMP9_cds . 1 
GB:HUMIVCOL13 332 
source isSNP SNP00021348 

consequence MMP9_cds.l 333 



Silent 607-607 
87 87 A>G 
GB : HUMIVC0L13 . v87 . G>A 
333 Silent 694-694 
132 132 A>G 
GB : HUMIVCOL13 . vl32 . C>T 
333 3 ' 

274 A>G 



274 



GIF MMP9-genoinic-fwd.gif 



MSF 

Full name : megakaryocyte stimulating factor 

Link : MSF_link_cdna 

Subsequence GB:NM_005807 1 5041 

CDS GB:NM_005807.1 4215 bp .#336 



34 4248 

GB:NM_005807 335 ioil 

source isSNP SNP00064566 

consequence GB :NWL.005807 . 1 
aB:NM_005807 335 2650 

source isSNP. SNPQ0108532 

consequence GB:NM_005807 . 1 
GB:NM_005807 335 . 3171 

source isSNP SNP00009620 

consequence GB :N»L_005807 . 1 
GB:NM_0Q5807 335 4187 

source isSNP SNP00061665 

consequence GB :NM_005807 . 1 
GB:NM_005807 335 4760 

source isSNP SNP00009621 

consequence GB :]S1M_005807 . 1 
GIF MSF-cdna-fwd.gif 
Link : MSF_link_genoinic 



ORF 
Allele 



Allele 



Allele 



Allele 



Allele 



Subsequence 
Subsequence 
Subsec(uence 
Subsequence 



MSF_cds . 1 
MSF_cds . 2 
MSF_cds . 3 
MSF_cds . 4 



181003 
181003 
181003 
181003 , 



336 
2650 



336 
3171 



336 
4187 



336 
4760 



Silent 
A>G 



Missense 
A>G 



Silent 
A>G 



Missense 
A>G 



1385-1385 



197905 
197905 
197905 
197905 



#337 
#338 
#339- 
#340 
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Subsequence 
Subsequence 



CDS 


MSF_cds.3 3936 


bp n 




exon 


181003 


181078 




exon 


184218 


184340 




exon 


185719 


185838 




exon 


190445 


193267 




exon 


193920 


193997 




exon 


195161 


195297 




exon 


195567 


195723 




exon 


196302 


196499 




exon 


196896 


197021 




exon 


197808 


197905 


iriRNA 


MSF_inma_build.l E 




exon 


180982 


181078 




exon 


184218 


184340 




exon 


185719 


185838 




exon 


188235 


188384 




exon 


188921 


189049 






190445 


193267 




exon 


193920 


193997 




exon 


195161 


195297 




exon 


195567 


195723. 




exon 


196302 


196499 




exon 


196896 


197021 




exon 


197808. 


198681 


CDS 


MSF_cds.4 3813 


bp ! 




exon 


181003 


181078 




exon 


185719 


185838 




exon 


190445 


193267 




exon 


193920 


193997 




exon 


195161 


195297 




exon 


195567 


195723 




exon 


196302 


196499 




exon 


196896 


197021 




exon 


197808 


197905 



GB:AL133553_7 1 214019 #341. 

MSF_inrna_build.l 180982 198681 #342 

10 exons #339 



CDS MSF_cds.l 4215 
exon 181003 



exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 



184218 
185719 
188235 
188921 
190445 
193920 
195161 
195567 
196302 
196896 
. 197808 



CDS MSF_cds.2 4092 
exon 181003 
exon 185719 
exon 188235 
exon 188921 
exon . 190445 



bp 12 exons 

181078 

184340 

185838 

188384 

189049 

193267 

193997 

195297 

195723 

196499 

197021 

197905 
bp 11 exons 

181078 

185838 

188384 

189049 

193267 



030S4ie6A2J. 
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TABLE 1 (Cont.) 



exon 
expn 
exon 
exon 
exon 



193920 
195161 
195567 
196302 
196896 
197808 

QB:AL133553. 
source 
consequeiice 
consequence 
consequence 
consequence 
GB:AL133553. 
source 
consequence 
consequence 
consequence 
consequence 
GB:AL133553_ 
source 
consequence 
consequence 
consequence 
consequence 
GB:AL133553_ 
source 
consequence 
consequence 
consequence 
consequence 
QB:AIil33553_ 
source 
consequence 
consequence 
consequence 
consequence 
GB:AL133553_ 
source 
consequence 
con sequen c e 
consequence 
consequence 
GB:AL133553_ 
source 
consequence 
consequence 
c on s equenc e 
consequence 
GB:AL133553, 
source 
consequence 
consequence 
consequence 



193997 . 
195297 
195723 
196499 
197021 
197905 
.7 341 
wetSNP 
MSF_cdB . 3 
MSF_cds.4 
MSF_cds.l 
MSF_cds.2 
7 341 
wetSNP 
MSF_cds.3 
MSF_cds . 4 
MSF_cds . 1 
MSF_cds . 2 
.7 341 
wetSNP 
MSF_cds . 3 
MSF_cds . 4 
MSF_cds . 1 
MSF_cds.2 
_7 341 



190505 190505. G>T 

GB:AL133553_7.vl90505.A>C 

339 Missense 127-127 

340 Missense 86-86 D>A 

337 Missense 220-220 

338 Missense 179-179 
190559 190559 A>G 
QB:AL133553_7.vl90559.C>T 

339 Missense 145-145 

340 Missense 104-104 

337 Missense 238-238 

338 Missense 197-197 
190755 190755 A>G 
'gB:AL133553_7 .vl90755.G>A 

339 Silent 210-210 

340 Silent 169-169 

337 Silent 303-303 

338 Silent 262-262 
190824 190824 A>G 



isSNP SNP00064566 



MSF_cds . 3 
MSF_cds . 4 
MSF_cds . 1 
MSF_cds - 2 
.7 341 



339 Silent 

340 Silent 

337 Silent 

338 Silent 
192463 



233-233 
192-192 
326-325 
285-285 



192463 



isSNP SNP00108532 



MSF_cds . 3 
MSF_cds . 4 
MSF_cds . 1 
MSF_cds . 2 
7 341 



339 Missense 

340 Missense 

337 Missense 

338 Missense 
192984 192984 



isSNP SNP00009620 



CT:AL133553. 



MSF_cds.3 
MSF_cds . 4 
MSF_cds . 1 
MSF_cds . 2 
7 341 
wetSNP 
MSF_cds.3 
MSF_cds.4 
MSF_cds . 1 
MSF_cds . 2 
7 341 
wetSNP 
MSF_cds . 3 
MSF_cds , 4 
MSF_cds.l 
MSF_cds . 2 
.7 341 



339 Silent 

340 Silent 

337 Silent 

338 Silent 



780-780 
739-739 
873-873 
832-832 

A>G 

953-953 
912-912 
1046-1046 
1005-1005 
193235 193235 A>Q 

QB:ALl33553_7.vl93235.A>G 

339 Missense 1037-1037 

340 Missense 996-996 

337 Missense 1130-1130 

338 Missense 1089-1089 
193258 193258 A>G 
GB:AL133553_7 .vl93258.A>G 

339 Missense 1045-1045 

340 Missense 1004-1004 

337 Missense 1138-1138 

338 Missense 1097-1097 
1^^691 196691 G>T 



D>A 
D>A 



T>M 

T>M 
T>M 
T>M 



P>S 
P>S 
P>S 
P>S 



N>S 
N>S 
N>S 
N>S 



M>V 
M>V 
M>V 
M>V 
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TABLE 1 (Cont.) 

source isSNP SNP00023429 

consequence MSF_cds.3 339 Intron 

consequence MSF_cds.4 340 Intron 

consequence MSF_cds.l 337 Intron 

consequence MSF_cds.2 338 Intron 
Allele GB:AL.133553_7 341 197844 197844 A>G 

source isSNP SNP00061665 

consequence MSF_cds.3 339 Missense . 1292-1292 A>V 

consequence MSF_cds.4 340 Missense 1251-1251 A>V 

consequence MSF_cds.l 337 Missense 1385-1385 A>V 

consequence MSF_cds.2 338 Missense 1344-1344 A>V 

Allele GB:AL133553_7 341 198417 198417 A>G 

source isSNP SNP00009621 

consequence MSF_cds.3 339 3' 

consequence MSF_cds.4 340 3' 

consequence MSF_cds.l 337 3' 

consequence MSF_cds.2 338 .3' 

GIF MSF-genomic-fwd.gif 



-repressor 2 



8686 



#343 
#344 



NCOR2 

Full name : nuclear receptor 
Link : NCOR2_link_cdna 

Svibsequence GB:AF125672 1 

CDS GB:AF125672.1 7524 bp 

ORF 157 7680 

Allele GB:AF125672 343 165 165 G>T 
source isSNP SNP00035702 

consequence QB: AF125672 . 1 344 

Allele GB:AF125672 343 618 618 A>G 
source isSNP SNP00105557 

consequence GB: AF125672 . 1 344 

Allele GB-.AF125672 343 • 2859 2859 A>G 

source isSNP SNPOOlOlOll 

consequence GB:AF125672 . 1 344 

Allele GB:AF125672 343 4728 4728 A>G 

source isSNP SNP00075034 

consequence GB : AF125672 . 1 344 

Allele GB:AF125672 343 4749 4749 A>G 

source isSNP SNP00069757 

consequence GB : AF12 5672 . 1 344 

Allele GB:AF125672 343 4957 4957 A>Q 

isSNP SNP00101012 

GB:AF125572 .1 344 

343 5085 5085 A>G 
isSNP SNP00075035 

GB:AF125572 .1 344 

343 5100 5100 A>G 
isSNP SNP00075036 

GB:AF125672.1 344 

343 5221 5221 A>Q 
isSNP SNP00012485 

GB:AF125672.1 344 

405 A>G 



source 
consequence 
GB:AF125672 
source 
consequence 
GB:AF125672 
source 
consequence 
GB:AF125672 
source 
consequence 
GB:AF125672 



Silent 3-3 G 

Silent 154-154 P 

Silent 901-901 A 

Silent 1524-1524 G 

Silent 1531-1531 'X 

Missense . 1601-1601 Y>H 

Silent 1643-1643 R 

Silent 1648-1648 N 

Missense 1689-1689 T>A 



343 



7405 
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TABLE 1 (Cont.) 



consequence 
GB:AF125672 
source 
consequence 
.GB:AF125672 



source isSNP SNP00015859 

GB:AF125672.1 
343 7431 7431 
isSNP SNP00101013 
GB:AF125672.1 
343 7751 7751 
isSNP SNP00101014 
consequence GB: AF125672 . 1 
GB:AF125672 343 8597 8597 
isSNP SNP00062569 
GB-.AF125672.1 
343 8602 8602 
isSNP SNP00012487 
GB:AF125672,1 



consequence 
GB:AF125672 



source 
consequence 



344 
A>G 



344 
A>G 



344 
A>G 



344 
A>Q 



2425-2425 



GIF NCOR2-cdna-fwd.gif 



NOG 

Pull name : NOG 

Link : NOG_link_genoinic 

Subsequence GB:AC005553 1 179651 #345 

Subsequence NOG_cds . 1 146202 145504 #346 

Subsequence NOG_iama_build. 1 147012 145465 

CDS NOG_cds.l 699 bp 1 exon #346 

exon 146202 145504 
BnKNA NOG_inma_build.l 1547 bp 1 exon #347 

exon 147012 145466 
Allele GB:AC005553 345 145585 145585 A>G 

source wetSNP GB: AC005553 .vl45585 .G>A 

consequence NOG_cds.l 346 Silent 206-2( 
GIF NOQ-genoinic-rev.gif 



N0TCSH3 

Link . : NOTCH3_^link_cdna 

Subsequence QB:NOTCH3 
CDS GBrNOTCHS.l 6966 bp 

ORF 
Allele 



8091 
#349 



79 7044 
GB:NOTCH3 
source 

cons equence QB : NOTCH3 . 1 
QB:NOTCH3 
source 
consequence 
GB:NOTCH3 
source 



consequence 

GB:NOTCH3 

source 

consequence 

GB:NOTCH3 



348 1218 1218 
isSNP SNP00116668 
349 

348 1565 1565 
isSNP SNP00116669 
GB:NOTCH3.1 349 
348 2616 2616 
isSNP SNP00116670 
GB:NOTCH3.1 349 
348 4520 4520 
isSNP SNP00116671 
GB:NOTCH3.1 349 
348 5740 5740 
isSNP SNP00054178 



Silent 

A>G • 



Missense 
A>G 



Silent 
A>G 



Missense 
A>G 



consequence GB : NOTCH3 . 1 



1888-1888 
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TABLE 1 (Cont.) 



Missense 
A>G 



Silent 
A>G 



Missense 
A>Q 



Allele GB:NOTCH3 ' 348 6355 6355 A>G 

source isSNP SNP00037780 

consequence GB:NOTCH3.1 349 
Allele GB:NOTCH3 348 6 516 6516 

source isSNP SNP00054179 

consequence GB:NOTCH3.1 349 
Allele GB:NOTCH3 348 6746 6746 

source isSNP SNP00048081 

consequence GBrNOTGHB.l 349 
Allele GB:NOTCH3 348 7733 7733 

source isSNP SNP00037781 

consequence GB:.NOTCH3.1 349 
Allele GB:NOTCH3 348 7881 7881 

source isSNP SNP00062225 

consequence GB:NOTCH3.1 349 
Allele GB:NOTCH3 348 7914 7914 

source isSNP SNP00066446 

consequence GB:NOTCH3.1 349 
Allele GB:NOTCH3 348 8023 8023 

source isSNP SNP00066447 

consequence GB:NOTCH3.1 349 3' 
GIF NOTCH3-cdna-fwd.gif 
Link : NOTCH3_link_genoinic 

Subsequence NOTCH3_cds.l 40735 3819 #350 

Subsequence GB : AC004663_1 . 1 41150 #351 



2093-2093 



A>G 



3' 
A>Q 



3' 

A>G 



C3DS NOTCH3_cds.l 



6846 bp 



32 exons 



exon 


40733 


40657 


exon 


35676 


35534 


exoh 


35455 


35117 


exon 


35024 


34902 


exon 


34814 


34581 


exon 


32585 


32430 


exon 


32331 


32146 


exon 


31505 


31392 


exon 


31151 


31038 


exon 


30495 


30262 


exon 


30145 


30035 


exon 


28836 


28644 


exon 


28565 


28414 


exon 


28176 


28063 


exon 


27607 


27452 


exon 


24958 


24733 


exon 


24319 


24118 


exon 


23985 


23838 


exon 


23413 


23229 


exon 


22653 


22521 


exon 


22439 


22182 


exon 


22098 


21980 


exon 


21247 


20682 


exon 


17557 


17225 


exon 


13982 


13828 


exon 


13710 


13488 


exon 


13327. 


13243 


exon 


10568 


10406 


exon 


9248 


8944 



_03054166A2J_> 
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TABLE 1 (Cont.) 



exon 
exon 



8672 8525 
5719 5622 
4871 3819 
GB:AC004663_ 



351 



wetSNP 



3796 3796 
GB:AC004663_ 



l.v3796.A>T 



consequence 

GB:AC004663_ 

source 

consequence 

GB:AC004663_ 

source 

consequence 

QB:AC004663_ 

source 

consecfuence 

GB:AC004663_ 

source 

consequence 

GB:AC004663_ 

source 

consequence 

GB:AC004663_ 

source 

con s equ enc e 

GB:AC004663_ 

source 

source 

source 

consequence 

GB:AC004663_ 

source 

consequence 

GB:AC004663_ 

source 

consequence 

QB:AC004663. 

source 

consequence 

GB:AC004663. 

source 

consequence 

GB:AC004663. 

source 

consequence 

GB:AC004663. 

source 

consequence 

GB:AC004663. 

source 

consequence 

GB:AC004663. 

source 

consequence 



NOTCH3_cds . 1 
.1 351 4117 

isSNP SNP00048081 

NOTCH3_cds . 1 
.1 351 4347 

isSNP SNP00054179 

NOTCai3_cds . 1 
.1 351 4508 

isSNP SNP00037780 

WOTCH3_cds . 1 
.1 351 5727 



350 
4117 



350 
4347 



350 
4508 



350 
5727 



Silent 
A>G 



Missense 
A>G 



wetSNP QB:AC004663_l.v5727.A>G 
NOTCH3_cds .1 350 Int ron 

.1 351 5943 5943 A>G 

dbSNP gnl|dbSNP|ss730238_allele 
NOTCH3_cds.l 350 Intron 

.1 351 17519 17519 A>G 

isSNP SNP00116671 

NOTCH3_cds.l 350 Missense 

.1 351 18749 18749 A>G 

dbSNP gnl|dbSNPlss680542_allele 
dbSNP gnl IdbSNpj ssll43619_allele 
dbSNP gnl | dbSNP j ss372819_allele 
NOTCH3_cds.l 350 Intron 

.1 351 22353 22353 A>G 

wetSNP GB:AC004663_l.v22353.C>T 
NOTCH3_cds.l 350 Missense 

1 351 23922 23922 OG 

wetSNP GB : AGO 0 4 6 63_1 . v2 3 9 22 . OG 

NOTCH3_cds.l 350 Missense 

.1 351 24045 24045 A>Q 

wetSNP GB:AC004663_l.v24045.T>C 
N0TCH3_cds.l 350 Intron 

.1 351 27480 27480 A>G 

isSNP SNP00116670 
isrOTCH3_cds . 1 350 Silent 

.1 351 28173 28173 A>G 

GB:AC004663_l.v2B173.C>T 
1 350 Missense 

28749 28749 A>G 
GB:AC004e63_l.v2B749 .C>T 

350 Missense 
29997 29997 OG 
GB:AC004663_l.v29997 .G>C 

350 Intron 
32482 32482 A>G 



1441-1441 



wetSNP 

NOTCH3_cds . 
.1 351 

wetSNP 

NOTCH3_cds . 
.1 351 

wetSNP 

NOTCH3_cds . 
1 351 

isSNP SNP00116668 

NOTCH3_cds . 1 



350 



Silent 



OIF NOTCH3-genomic-rev.gif 
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NPR2 

Full name : Atrionatriuretic Peptide Receptor Type B 
Link : NPR2_link_cdna 

Subsequence GB : HUMGUANCYC 1 4081 #352 

CDS GB: HUMGUANCYC. 2 3144 bp #353 

ORF 651 3794 
Allele GB:HUMGUANCyc 352 2222 2222 A>G 

source isSNP SNP00028343 

consequence GB : HUMQUANCYC . 2 353 Sile: 
GIF NPR2-cdna-fwd.gif 



#354 



A>G 



OGN 

Full name : osteoglycin 
Link : OGN_link_cdna 

Subsequence GB:HSM801395 1 

CDS GB:HSM801395.1 441 bp 

ORF . 1 441 
Allele GB:HSM801395 354 64 

source isSNP SNP00100803 

consequence GB :HSM801395 . 1 
Allele GB:HSM801395 354 909 

source isSNP SNP00011097 

consequence GB:HSM801395 . 1 
GIF OGN-cdna-fwd.gif 
Link : oaN_link_genoinic 

Subsequence OGN_cds.2 48897 32003 #356 

Subsequ^ce QB: AL354924_2 1 192427 

Subsequence OGN_;mrna_build . 2 50083 30350 #358 



2101 

#355 



355 
909 



355 



Missense 
A>G 



22-22 L>F 



iciRNA 



exon 
exon 
exon 
exon 
CDS OGN_cds.2 

exon 48897 
exon 



exon 
exon 
exon 
Allele 



OGN_inma_bui 1 d . 2 
50083 49983 
48969 48721 
46672 46579 
38619 38461 
35431 35229 
32679 32584 
32173 30350 
900 bp 
48721 
45579 
38461 
35431 35229 
32679 32584 
32173 32003 
GB:AL354924_2 
source 



2726 bp 



7 exons 



46672 
38619 



357 31535 31535 A>G 
isSNP SNP00011097 

consequence 0GN_cds.2 356 3' 

Allele GB:AL354924_2 357 35339 35339 A>G 

source isSNP SNP00lO08O3 

consequence OGN_cds.2 356 Missense 
GIF OGN-genomic-rev.gif 



236 



8NSDOCID: <WO 030S4166A2_I_j 



wo 03/054166 



PCT/US02/41225 



TABLE 1 (Cont.) 



Full name : osteomodulin 
Link : OMD__link_cdna 

Subsequence GBtOMD 
CDS GBrOMD.l 1266 bp 

ORF 101 1366 
GB:OMD 
source 
consequence 
GB:OMD 



2263 
#360 



Allele 



Allele 



Allele 



Allele 



OG 



Missense 
A>G 



Missense 
A>G 



G>T 



359 159 159 
isSNP SNP00023658 
GB:OMD.l 360 
359 762 762 
isSNP SNF00023659 
. consequence GBiQMD.l 360 
GB:OMD 359 1969 1959 

source isSNP SNP00023660 

consequence GB:OMD.l 360 
GB:OMD 359 2071 2071 

source isSNP SNF00106046 

consequence GB:OMD.l 360 3' 

GIF OMD-cdna-fwd.gif 
Link : FL_1258977_link_genoinic 

Subsequence GB:AB009589 1 12414 #361 

Subsequence GB : AB009589_1258977CD1 8540 

Subsequence FL_1258977_inrna_build. 1 1685 

iriRNA FL_12 58977_inma_build.l 2396 bp 3 

exon 1685 1892 
exon 8524 9479 
exon 10624 11855 
CDS GB:AB009589_1258977CD1 1263 bp' 2 exons 

exon 8540 9479 



10946 #362 
11855 #363 
3 exons 



Allele 



S>N 
Allele 



10624 10946 

OT:AB009589 

source 

consequence 

GB:AB009589 

source 



361 8598 8598 OG 
isSNP SNP00023658 
GB:AB009589_1258977CD1 
361 9201 9201 A>G 
isSNP SNP00023659 



20-20 OS 



consequence GB:AB009589_125B977CD1 362 Missense 

QB:AB009589 361 • 10042 10042 A>G 

source dbSNP gnl ] dbSNP| sB312223_allele 

consequence GB : AB009589_1258977CD1 362 Intron 

GB:AB009589 361 10596 10596 A>G 

source wetSNP GB: AB009589 . vl0596.. A>Q 

consequence GB;AB009589_1258977CD1 362 Intron 

GEB:AB009589 361 11552. 11552 A>G 

source isSNP SNP00023660 

consequence aB:AB009589_1258977CDl • 362 3' 

GB:AB009589 361 11654 11654 G>T 

source isSNP SNP00106046 



consequence GB: AB009589_1258977CD1 
GIF OMD-genomic-fwd.gif 



362 



EDCD6IP 

Full name : programmed cell death 6-interacting protein 
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TABLE 1 (Cont.) 



Link : PDCD5IP_link_ccana 

Subsequence GB:AF151793 



CDS GB:AF151793 
ORF 127 



2733 
GB:AF151793 
source 
consequence 
QB:AF151793 
source . 
consequence 
GB:AF151793 
source 
consequence 
GB:AF151793 
source, 
consequence 
GBjAF151793 
source 
consec[uence 
GB:AF151793 
source 
consequence 
GB:AF151793 



source 
consequence 
ele GB:AF151793 
source 

consequence GB:2^151793 . 1 
PDCD6IP-cdna-fwd.gif 



2607 bp 

364 1051 1051 
isSNP SNP0b029958 
GB:AF151793 .1 . 
364 1258 1258 
isSNP SNP00108790 
GB:AF151793.1 
364 1298 1298 
isSNP SNP00108791 
QB:AF151793.1 
364 1695 1695 
IsSNP SNP00093444 
QB:AF151793.1 
364 2230 2230 
isSNP SNP00121559 
GB:AF151793.1 
364 2315 2315 
isSNP SNP000066b4 
GB:AF151793.1 
364 2386 2386 
isSNP SNP00029960 
GB:AF151793.1 
364 2421 2421 
isSNP SNP00121560 



#364 
#365 



365 Missense 309-309 
A>G 



365 Missense 378-378 
G>T 



365 Missense 
A>G 



365 Silent 
A>G 



365 Missense 702-702 . 
A>G 



365 Missense 730-730 
A>G 



365 Missense 754-754 
A>G 



365 Silent 



PDNPl • . 

Full name : phosphodiesterase I (nucleotide pyrophosphati 

to mouse Ly-41 antigen) ) 

Link : PDNPl_link_cdna 

Subsequence EMrHSAOTOTAX 1 3231 #366 

CDS EM:HSAirrOTAX.2 2748 bp #367 

ORF 50 2797 
Allele EMtHSAUTOTAX 366 342 342 A>G 

source isSNP SNP00025434 

consequence EM:HSAUT0TAX.2 
Allele EMtHSAUTOTAX 366 696 

source isSNP SNP00075872 

consequence EM : HSAUTOTAX . 2 
Allele EM:HSAUTOTAX 366 1682 

source isSNP SNP00025435 

consequence EM : HSAUTOTAX . 2 
Allele EM: HSAUTOTAX 366 17 89 

source isSNP SNP00004604 

consequence EM : HSAUTOTAX . 2 
Allele EMtHSAUTOTAX 366 2398 

source isSNP SNP00122211 

consequence EM : HSAUTOTAX . 2 
Allele EMtHSAUTOTAX 366 2539 

238 



I (homologous 



367 
696 



367 
1682 



367 
1789 



367 
2398 



367 
2539 



Mis.sense 
A>G 



Missense 
A>6 



Missense 
A>Q 



Silent 
G>T 



Silent 
A>G 
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wo 03/054166 



PCT/US02/41225 



TABLE 1 (Cont.) 



source isSNP SNP00004605 

consequence EM : HSAUTOTAX . 2 
Allele EM: HSAUTOTAX 366 2681 

source isSNP SNP00059344 

cons equence EM : HSAUTOTAX . 2 
GIF PDNPl-cdna-fwd.gif 
Link : PI3NPl_link_genomic. 



367 Silent 
2681 Q>T 



Subsequence 


IN: 98092911313498 4217 


4948 


#368 




Subsequence 


IN: 98061109562226435 


5050 


5980 


#369 


Subsequence 


IN: 98092910591328158 


3611 


4115 


#370 


Subsequence 


IN:98092911013628201 


100 


699 


#371 


Subsequence 


IN: 98092911024828217 


2027 


2526 


#372 


Subsequence 


IN: 98092911044928261 


3068 


3509 


#373 


Subsequence 


IN: 98092911065328292 


801 


1418 


#374 


Subs equence 


IN: 98092913141116289 


6183 


6572 


#375 


Subs equenc e 


IN: 98111010592914993 


1520 


1926 


#376 


Subsequence 


IN: 98111011021915028 


2628 


2967 


#377 


Allele IN: 


;98092910591328158 370 


232 


232 


A>G 


source isSNP SNP0002543 5 








Allele IN; 


;98092913141116289 375 


189 


189 


Q>T 



isSNP SNP00059344 



PLA2G2A 

Full name : phospholipase A2, group IIA 
Link : PLA2G2A_link_cdna 

Subsequence GB : HUMRASFAB 1 854 #378 

CDS GB : HUMRASFAB. 1 435 bp #379 

ORF 136 570 
Allele GB:HUMRASFAB 378 267 267 A>G 

source isSNP SNP00010003 

consequence QB : HUMRASFAB . 1 379 Silent 44-44 

Allele GB:HUMRASFAB 378 800 800 A>G 

source isSNP SNP00021612 

consequence GB : HUMRASFAB . 1 379 3' 

GIF PLA2G2A-cdna-fwd.gif 
Link : PLA2G2A_link^genoinic 

Subsequence PLA2G2A_cds . 1 51704 48629 #380 

Subsequence PLA2G2A_nima_build. 1 52537 48418 #381 

Subsequence GB: AL358253_1 1 180550 #382 

Subsequence LG : 474322 . 13_mma_build . 1 52786 48418 #383 

Subsequence PLA2G2A_cds . 2 51704 50985 #384 

iriRNA LG: 474322 .13_inma_build.l 1028 bp 5 exons 

exoA 52786 52511 

exon 51810 51665 

exon 51455 51311 

exon 51052 50946 

exon 48771 48418 
CDS PLA2G2A_cds . 1 435 bp 4 exons #380- 

exon 51704 51665 

exon 51455 51311 

exon 51052 50946 ... 
exon 48771 48629 
CDS PLA2G2A_cds . 2 108 bp 



"23r" 
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exon 51704 51665 
exon 51052 50985 
mRNA PLA2G2A_itn:na_build.l 779 bp 5 exons 

exon 52537 52511 
exon 51810 51665 
exon 51455 51311 
exon 51052 50946 
exon 48771 48418 
Allele GB:AL358253_1 382 51364 51364 A>G 

source isSNP SNP00010003 

consequence PLA2G2 A^cds . 1 380 Silent 

consequence PLA2G2A_cds . 2 384 Intron 

Allele GB:AL358253_1 382 52584 52584 OG 

source isSNP SNP00021611 

consequence PLA2G2A_cds . 1 380 5' 

consequence PIiA2G2A_cds . 2 384 5' 

GIF PLA2G2A-genoraic-rev.gif 



PPP1R5 

Full name ': protein phosphatase 1, 
Link : PPPlR5_link_cdna 

Subsequence GB:yi8207_l 1 



regulatory (inhibitor) subunit 5 



CDS GB:yi8207_l. 
ORF 



954 bp 



#385 
#386 



1045 

Allele GB:Y18207_1 

source 

conseq^uence 
Allele GB:Y18207_1 

source 

consequence .GB:Y18207_1.1 
GIF PPPlR5-cdna-fwd.gif 
Link : PPPlR5_link_genomic 

Sxibsequence QB: AC020691_2 1 

Subsequence PPPlR5_jnma_build. 1 



385 571 571 
isSNP SNP00041149 
QB:Y18207_1.1 
385 1096 1096 
isSNP SNP00060710 



386 
G>T 



Subsequence . PPPlR5_cds . 1 106194 

CDS PPPlR5_cds.l 939 bp 1 exon 

exon 106194 107132 

PPPlR5_inma_build.l 1160 bp 

103997 104103 
106193 107245 
GB:AC020691_2 387. 106523 

source wetSNP GB:AC020691. 

consequence PPPlR5_cds . 1 389- 
GB:AC020691_2 387 106658 

source isSNP SNP00041149 

consequence PPPlR5_cds.l 389 
GB:AC020691_2 387 107183 

source isSNP SNP00060710 

consequence PPPlR5^cds,l 389 
GIF PPPlR5-genoinic-fwd.gif 



152048 #387 
103997 107245 

107132 #389 
#389 



itiRNA 

exon 
e2xon 
Allele 



Allele 



Allele 



2 exons 



#388 



106523 G>T 
.2.vl06523.T>G 



Missense 
106658 



Silent 
107183 



110-110 
A>G 



155-155 
G>T 
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PRELP 

Full name : proline arginine-rich end leucine-rich' repeat protein 
Link : PRELP_linK_cdna 

Subsequence GB:HSU29089 1 1560 

CaDS QB:HSU29089,1 1149 bp 

ORF 129 1277 

QB:HSU29089 390 1170 1170 
isSNP SNP00001359 
QB:HSU29089.1 
390 1489 1489 
isSNP SNP00001361 
GB:HSU29089.1 



Allele 



Allele 



consequence 
GB:HSy29089 
source 



consequence 
GIF PRELP-cdna-fwd.gif 
Link : PRELP_link_genoinic 



#390 
#391 



G>T 



391 
G>T 



391 



Missense 



PRELP_cds.l 82496 86192 #392 
GB:AC022000_1 1 154681 

PRELP_inma_build.l 75139 86474 

1149 bp 2 axons #392 

83468 
86017 86192 

PRELP_inma_build.l 1559 bp 3 exor 

75139 75250 
82480 83468 
86017 86474 

GB:AC022000_1 393 86085 86085 G>T 

source isSNP SNP00001359 

consequence PRELP_cds.l 392 Missense 
Allele GB:AC022000_1 393 86404 86404 G>T 

source isSNP SNP00001361 

consequence PRELP_cds.l 392 3' 
GIF PRELP-genomic-fwd.gif 



Subsequence 
Subsequence 
Subsequence 
CDS PRELP_cds.l 

axon 8249 6 

exon 



exon 
exon 
Allele 



#393 
#394 



PRSSll 

Full name s serine protease. 
Link : FL_1787335_link_cdna 

Subsequence FN: 1787335CB1 1 

CDS FN:1787335CB1.1 1443 bp 
ORF 49 1491 

Allele FN:1787335CB1 395 150 

source isSNP SNP00058999 

consequence FN: 1787335CB1 . 1 

Allele FN:1787335CB1 395 156 

source isSNP SNP00117078 

consequence FN: 1787335CBi. 1 

Allele FN:1787335CB1 395 914 

source isSNP SNP0bl20314 

consequence FN: 1787335CB1 . 1 

Allele FN:1787335CB1 395 1321 

source isSNP SNP00105589 

consequence FN : 1787335CB1 . 1 

Allele FN:1787335CB1 395 1521 

source isSNP SNP00105590 

consequence 



2054 
#396 



396 
156 



396 
914 



396 
1321 



396 
1521 



Silent 
G>T 



Silent 
A>G 



Missense 
C>G 



BNSDOCID: <WO 030541 66A2_I_> 



wo 03/054166 



PCT/US02/41225 



TABLE 1 (Cont.) 



GIF PRSSll-cdna-fwd.gif 
Link : FL_1787335_link_genoTOic 

Subsequence GB: AF157623_1_1787335CD1 17526 70213 #397 

Subsequence GB : AF157623_1 1 79597 #398 

Subsequence FL_1787335_inma_build. 1 17478 70761 #399 

CDS GB:AF157623_1_1787335CD1 1443 bp 9 exons #397 

exon 17526 17997 
44770 44869 
45290 45494 
62561 62755 
63240 63272 
64526 64640 
65966 66023 
67827 67922 
70045 70213 

FL_17 87335_itama_build.l 2039 bp 9 exons #399 

17478 17997 
44770 44869 
45290 45494 
62561 62755 
63240 63272 
64525 64640 
65966 66023 
67827 67922 
70045 70761 

GB:AF157623_1 398 17627 17627 A>G 

source isSNP SNP00068999 

consequence GB: AF157623_1_1787335CD1 397 Silent. 



iriRNA 



exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 

exon 
exon 
exon 
exon 
exon 
exon 
exon 



exon 
Allele 



34-34 A 
Allele 



36-36 G 
Allele 



GB:AF157623_ 

source 

consequence 



.1 398 17633 17633 G>T 

isSNP SNP00117078 
GB:AF157623_1_1787335CD1 



397 



397 



Intron 



Intron 



L 398 21721 21721 A>G 

isSNP SNP00101582 
GB:AF157623_1_1787335CD1 
GB:AF157623_1 398 35790 35790 A>Q 

source isSNP SNP00049308 

consequence GB:AF157623_1_1787335GD1 
GB:AF157623_1 398 44762 44762 G>T 

source wetSNP GB: AF157623_1 .v44762 .Q>T 

consequence GB : AF157623_1_1787335CD1 . 397 Intron 

GB:AF157623_1 398 45470 45470 A>G 

source wetSNP GB : AF157623_1 . v45470 . C>T 

consequence GB : AF157623_l_1787335cpi 397 Silent 



GB:AF157623. 

source 
consequence 



251-251 
Allele 



GB:AF157623_1 398 
source wetSNP 
consequence GB:AF157623. 



45587 .;%:'?7 A>G 
GB : AFl b 6 2 3_1 . v4 5 5 87 . OT 
1_1787335CD1 397 Intron 

6B:AF157623_1 398 47792 47792 A>G 

source isSNP SNP00105588 

consequence GB: AF157623_1_1787335CD1 397 Intron 

GB:AF157623_1 398 47834 47834 A>G 

isSNP SNP00120312 

GB:AF157623_1_1787335CD1 397 Intron 

242 



consequence 
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GB:AF157623_1 



398 



47913 47913 A>G 



289-289 
Allele 



source isSNP SNP00120313 

consequence GB: AF157623_1_1787335CD1 397 Intron 

QB:AF157623_1 398 62541 62541 A>G 

source wetSNP' GB: AF157623_1 .v62541 .G>A 

consequence GB: AF157623„l_1787335caDl . 397 Intron 

GB:AF157623_1 398 62545 62545 A>G 

source wetSNP GB:AF157623_l.v62545.G>A 

consequence GB: AF157623_1_1787335CD1 397 Intron 

QB:AF157623_1 398 62649 62649 A>G 

source isSNP SNP00120314 

consequence GB: AF157623_1_1787335CD1 397 Missense 

Q>R 

GB:AF157623_1 398 63355 63360 TGTTTT>TT 

source wetSNP GB:AF157623_1 .v63355.TGTTTT>TT 

consequence GB: AF157623_1_1787335CD1 397 Intron 

GB:AF157623_1 398 70243 70243 A>Q 

source isSNP SNP00105590 

consequence GB: AF157623_1_1787335CD1 397 3' 



GIF PRSSll-genomic-fwd.gif 



PTGS2 

Full nfiime : Prostaglandin-endoperoxide Synthase 2 
Link : PTGS2_link_cdna 

Subsequence EMiHSCYCLOX 1 3387 #400 



Allele 


EM:HSCYCLOX 


400 


403 403 


OG 




source 


isSNP 


SNP00046167 




Allele . 


EM:HSCYCLOX 


400 


880 880 


G>T 




source 


isSNP 


SNP00076329 




Allele 


EMiHSCYCLOX 


400 


2033 2033 


A>G 




sourcie 


isSNP 


SNP00076330 




Allele 


EM:HSCYCLOX 


400 


2300 2300 


A>G 




source 


isSNP 


SNP00046168 




Allele 


EM:HSCYCLOX 


400 


2983 2983 


A>G 




source 


isSNP 


SNP00046169 





Link : PTGS2_link_genoinic 



Subsequence 
Subsequence 
Subsequence 
CDS PTGS2_cds . 1 

exon 1925 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

exon 

rtiRNA 

exon 
exon 



11097 
8146 



GB:HUMPTGS2 101 
PTGS2_cds.l 1925 
PTGS2_inma_build . 1 
1815 bp 10 exons 

1976 
2893 
3157 
3954 
4851 
5667 
6033 
6601 
7250 
8146 

P'rGS2_mrna_build.l- 3373 bp 

1828 1976 
2777 2893 

243 



2777 
3014 
3811 
4670 
5584 
5787 
6315 
7103 
7737 



#401 
#402 
1828 
#402 



9607 #403 
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exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
Allele 



3014 
3811 
4670 
5584 
5787 
6315 
7103 
7737 



3157 

3954 ■ 

4851 

5667 

6033 

6601 

7250 

9607 



aB:HUMPTGS2 

source 

consequence 

GB:HUMPTGS2 

source 

consequence 

GB:HUMPTGS2 

source 

consequence 

GB : HXMPTGS2 

source 

consequence 

QB : HUMPTGS2 

source 

consequence 

GB:HUMPTGS2 

source 

consequence 

QB:HUMPTQS2 



consequence 
GB:HUMPTGS2 
source 
consequence 
GB : HUMPTGS2 



401 3050 
wetSNP 
PTGS2_cds . 1 
401 3090 
wetSNP 
PTGS2_cds . 1 
401 3174 
wetSNP 
PTGS2_cds . 1 
401 3793 
wefcSNP 
PTGS2_cds . 1 
401 3829 
wetSNP 
PTGS2_cds .1 
401 5605 
wetSNP 
PTGS2_cds . 1 
401 5676 
wetSNP 
PTGS2_cds.l 
401 5746 



3050 OG 

GB:HUMPTGS2 .v3 050 .G>C 

402 Silent 102-102 

3090 A>G 

GB : HUMPTGS2 . v3 09 0 . OT 
402 Intron 
3174 OG 

GB : HIMPTGS2 . v3 174 . G>C 
402 Intron 
3793 A>G 

GB : HUMPTGS2 . v3 7 9 3 . OT 

402 Silent 132-132 

3829 A>G 

GB : HtfMPTGS2 . v3 82 9 . T>C 

402 Silent 144-144 

5605 A>G 

QB:HUMPTGS2 .v5605 ,G>A 

402 Intron 

5681 TATTTT>TT 

GB HUMPTGS2 . v5 6 7 6 . TATTTT>TT 

402 Intron 

5746 G>T 



isSNP SNP00076329 



consequence 
QB:HUMPTGS2 
source 
consequence 
GB:mMPTGS2 
source 
consequence 
.GB:HroMPTGS2 
source 
consecfuence 
GB:HUMPTGS2 

source 

consequence 

GB;HUMPTGS2 

source 

consequence 

QB:HUMPTGS2 

source 

consequence 

GB:HUMPTQS2 

source 



PTGS2_cds . 1 
401 6249 
wetSNP 
PTGS2_cds , 1 
401 6444 
wetSNP 
PTGS2_cds . 1 
401 6453 
wetSNP 
PTGS2_cds . 1 
401 7581 
wetSNP 
PTGS2_cds . 1 
401 7763 
wetSNP 
PTGS2_cds . 1 
401 7986 
wetSNP 
PT6S2_cds . 1 
401 8167 
isSNP SNP00076330 
PTGS2_cds . 1 402 
401 8434 8434 
isSNP SNPO 004616 8 



261-261 



402 Stop 
6249 A>Q 

GB : HUMPTGS2 . v6 2 4.9 . G>A 
402 Silent 335-335 
6444 A>G 

GB:HUMPTGS2 .v6444-G>A 
402 Silent 400-400 
6453 A>G 

GB : HUMPTGS2 . v6 4 53 . T>C 
402 Silent 403-403 
7581 A>G 

GB : HUMPTGS2 . v7 5 8 1 . T>C 
402 Intron 
7763 A>G 

GB : HUMPTGS2 . v7 7 6 3 . T>C 
402 Missense 511-511 
7986 G>T 

GB : HUMPTGS2 . v7 9 86 . C>A 
402 Silent 585-585 
8167 A>G 



__03054ie6A2_l_ 
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consequence PTGS2_cds . 1 402 3 ' 

Allele GB:HUMPTGS2 401 8473 8473 A>G 

source isSNP SNP00012871 

consequence PTGS2_cds.l 402 3' 

Allele GB:HUMPrGS2 401 9102 9102 A>G 

source isSNP SNP00046169 

consequence PT6S2_cds.l 402 3' 
GIF PTGS2-genoinic-fwd.gif 



PTHLH 

Full name ,: PTHLH 

Link : PTHLH_link_genomic 

Subsequence PTHLH_cds . 1 106964 117899 #404 

Subsequence GB: AC008011_6 1 183178 #405 

Subsequence PTHLH_inma_build . 1 106942 118367 

CDS PTHLH_cds.l 534 bp 3 exons #404 

exon 106964 107064 

112688 113110 . 

117890 117899 

PTHLH_mrna_build.l 1024 bp 3 exons i 

106942 • 107064 
112688 113110 
117890 118367 

QB:AC008011_6 405 113450 113450 I 

source isSNP SNP00043978 

consequence PTHLH_cds.l 404 Intron 
GB:AC008011_6 405 115075 115075 I 

source dbSNP gnl | dbSNPj ssl455356_allele 

consequence PTHIjH_cds.l 404 Intron 
aB:AC008011_6 405 115160 115160 1 

source dbSNP gnl ] dbSNP) ssl067559_allele 



exon 
exon 

inRlsrA 

exon 
exon 
exon 
Allele 



Allele 



Allele 



consequence PTHLH_cds . 1 404 
GIF PTHIJI-genoTOic-fwd.gif 



Intron 



PTHRl 

Full name : PTHRl 
Link : PTHRl_link_cdna. 

Subsequence GBiHUMPTHR 1 1948 #407 

CDS GB:HUMPTHR.l 1782 bp #408 

ORF 29 1810 
Allele GB:HUMPTHR 407 1417 1417 A>G 

source isSNP SNP00007059 

consequence GBrHUMPTHR.l 408 
GIF PTHRl-cdna-fwd.gif 
Link : PTHRl_link_genomic 



Subsequence 


GB:HSPTHPRH1 


1 


262 


#409 


Subsequence 


GB:HSPTHPRH2 


363 


769 


#410 


Subsequence 


GB:HSPTHPRH3 


870 


1168 


#411 


Subsequence 


GB:HSPTHPRH4 


1269 


2146 


#412 


Subsequence 


GB:HSPTHPRH5 


2247 


3249 


#413 


Subsequence 


QB:HSPTHPRH6 


3350 


4062 


#414 






245 
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Subsequence 
Subsequence 
Subsequence 
Subsequence 
Subsequence 
CDS PTHRl_cds.l 

exon 107 

exon 

exon 



GB:HSPTHPRH7 




4163 


4475 


#415 


OB : HSPTHPRH8 




4576 


4995 


#416 


GB:HSPTHPRH9 




5096 


5696 


#417 


PrHRl_cds . 1 


107 


5558 


#418 




PTHRl_n!ma^build , 1 




79 . 


5696 


1782 bp 


14 exons 


#418 





mRNA 



exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 
exon 

exon 



181 
558 
1070 
1546 
1773 
2053 
2546 
3133 
3607 
4004 
4367 
4769 
4892 
5558 

PTHRl_mma_build. 1 
79 181 
558 
1070 
1546 
1773 
2053 
2546 
3133 
3607 
4004 
4367 
4769 
4892 
5696 

GBsHSPTHPRHS 411 
source wetSNP 
consequence PTHRl_cds . 1 
GB:HSPTHPRH8 416 
source wetSNP 
consequence PTHRl_cds.l 
GIF PTHRl-genomic-fwd.gif 



exon 



exon 
exon 
exon 
exon 
exon 
exon 
exon 
Allele 



Allele 



456 
936 
1436 
1655 
1959 
2351 
2980 
3547 
3938 
4273 
4628 
4851 
5172 



456 

936 

1436 

1655 

1959 

2351 

2980 

3547 

3938 

4273 

4628 

4851 

5172 



104 104 A>G 

GB ! HSPTHPRH3 . vl 0 4 . G>A 

418 Silent 72-72 A 

311 311 A>G 

GB : HSPTHPRH8 . v3 1 1 . T>C 

418 Silent 463-463 



KARA. 

Full name : retinoic acid receptor, alpha 
Link : RARA_link_cdna ^ 

Subsequence GB:NM_000964 1 2907 #420 

CDS GB:NM_0009 64.1 1389 bp #421 

ORF 103 1491 
Allele GB:NM_000964 420 2327 2327 A>G 

source isSNP SNP00016145 

consequence CT:NM;_P00964 . 1 421 3' 

Allele QB:NM_000964 420 24|9 2439 A>G 
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source isSNP SNP00049381 

consequence GB:NM_000964.1 421 3' 

GIF RARA-cdna-fwd.gif 



RINl 

Full name : ras inhibitor- 
Link : RINl_linki_cclna 

Subsequence GB:HUMRASINF 1 1285 #422 

Allele GB:HUMRASINF 422 260 260 A>G 

source isSNP SNP00123606 

Allele GB:HtIMRASINF 422 424 424 A>G 

source isSNP SNP00123607 

Allele GB:HUMRASINF 422 722 722 A>Q 

source isSNP SNP00G33587 

Allele GB:HUMRASINF 422 921 921 A>G 

source isSNP SNP00007808 



Hissense 
A>G 



Missense 
A>G 



ROR2 

Full name : receptor tyrosine kinase-like orphan receptor 
Link : ROR2_link_cdna 

Subsequence GB:NM_004560 1 4092 #423 

CDS GB:KM_004560.1 2832 bp #424 

ORF 200 3031 

Allele GB:NM_004560 423 932 932 A>G 

source isSNP SNP00098926 

consequence GB:NM_004560 . 1 424 

Allele GB:NM_004560 423 1460 1460 

source . isSNP SNP00098927 

consequence GB :NML.OO4560 .1 424 

Allele GB:MM_004560 423 1973 1973 

source isSNP SNP00098928 

consequence QB:MM_004560 .1 424 

Allele QB:NM_004560 423 2287 2287 

source isSNP SNP00028168 

consequence GB:NM_004560 .1 424 

Allele GB:NM_004560 423" 2353 2353 

source isSNP SNP00098929 

consequence GB:NM_004560 . 1 424 

Allele QB:1SIH_004560 423 2654 2654 

source isSNP SNP00028169 

consequence GB •.NM:_004560 . 1 424. 

Allele GB:]S1M_004560 423 3743 3743 

source isSNP SNP00028170 

, consequence GB :NM_004560 . 1 424 3' 

Allele GB:NM_004560 423 3872 3872 G>T 

source isSNP SNP00G74568 

consequence GB :NM_004560 . 1 424 3' 

Allele GB:NM_004560 423 3919 3919 G>T 

source isSNP SNP00P74569 

consequence GB:N11_004560. 1 424 3' 

GIF ROR2-cdna-fwd.gif 

247 



Mi 
A>G 



Silent 
A>G 



Silent 
A>G 



His sense 
A>G 
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RORA 

Full name : RAR-related orphan receptor alpha. 
Link ! RORA_link_genomic 



Subsequence 
Subsequence. 
Subsequence 
Subsequence 
Subsequence 
Subsequence 
Subsequence 
Subsequence 
Subsequence 
Subsequence 



RORA_cds.l 64220 3076 
RORA_cds.2 64220 3076 
.RORA_cds.4 64220 3076 
QB:AC012344_4_000018 
GB : AC012344_4_000020 
GB:AC012344_4_000021 
GB:AC012344_4_000019 
aB:AC012344_4_000023 
RORA_pima_bui Id .1 64309 
RORA_nima_build . 4 64290 



#425 
#426 
#427 



9555 
21286 



inRNA 



RORA_inma_build.4 1908 bp 



9454 #428 

21185 #429 

34347 #430 

34448 43 824 #431 

43925 65900 #432 
2885 #433 
2885 #434 

11 exons #434 



exon 


64290 


64084 


exon 


51847 


51714 


exon 


25290 


25205 


exon 


19553 


19412 


exon 


16417 


16022 


exon 


10425 


10304 


exon 


9288 


9156 


exon 


8488 


8381 


exon 


6690 


6580 


exon 


5625 


5513 


exon 


3240 


2885 


CDS RORA_cds.l 


1671 bp 


exon 


64220 


64084 


exon 


43229 


43148 


exon 


41851 


41776 


exon 


25290 


25205 


exon 


19553 


19412 


exon 


16417 


16022 


exon 


10425 


10304 


exon 


9288 


9156 


exon 


8488 


8381 


exon 


6690 


6580 


exon 


5625 


5513 


exon 


3240 


3076 


CDS RORA_cds . 2 


1275 bp 


exon 


64220 


64084 


exon 


43229 


43148 


exon 


41851 


41776 


exon 


25290 


25205 


exon 


19553 


19412 


exon 


10425 


10304 


exon 


9288 


9156 


exon 


8488 


8381 


exon 


6690 


6580 • 


exon 


5625 


5513 


exon 


3240 


3076 



exon 
exon 



RORA_inma_build . 1 
64309 64084 
43229 43148 
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41851 


41776 




25290 


25205 




19553 


19412 


exon 


16417 


16022 


exon 


10425 


103104 


exon 






exon 


8488 


8381 


exon 


6690 


6580 


exon 


5625 


5513 




3240 


2885 


CDS RORA 


cds 4 


1647 bp 


exon 


"^6422 0 




exon 


51847 


51714 




25290 


25205 


exon 


19553 


19412 


exon 


16417 


16022 


exon 


10425 


10304 




9288 


9156 


exon 


8488 


8381 


exon 


6690 


6580 


exon 


5625 


5513 


exon 


3240 


3076 



GB:AC012344_4_00002G 429 11153 11153 A>G 

source dbSNP gnl | dbSNP | ss380580_allele 

consequence RORA_cds.l 425 Intron 
consequence RORA_cds.2 426 Iritron 
consequence RORA_cds.4 427 Intron 
GB:AC012344_4_000020 429 11182 11182 A>G 

source dbSNP gnl | dbSNP | ss380580_allele 

consequence RORA_cds.l 425 Intron 
consequence RORA_cds.2 426 Intron 
consequence RORA^cds.4 427 Intron 
GB:AC012344_4_000020 429 11183 11183 A>T 

source dbSNP gnl | dbSNP| ss507731_allele 

consequence RORA_cds.l 425 Intron 
consequence RORA^cds.2 426 Intron 
. consequence RORA_cds.4 427 Intron 
QB:AC012344_4_000020 429 11254 11254 A>G 

source dbSWP gnl | dbSNP | ss380580_allele 

consequence RORA_cds.l 425 Intron 
consequence RORA_cds.2 426 Intron 
consequence RORA_cds.4 427 Intron 
GB:AC012344_4_000020 429 11255 11255 A>T 

source dbSNP gnl | dbSNP | ss507731_allele 

consequence RORA_cds.l 425 Intron 
consequence RORA_cds . 2 426 Intron 
consequence RORA_cds.4 427 Intron 
GB:AC012344_4_OOP020 429 11264 11264 A>G 

source dbSNP gnl | dbSNP | ss380580_allele 

consequence RORA_cds.l 425 Intron 
consequence RORA_cds.2 426 Intron 
consequence RORA_cds . 4 427 Intron 
GB:AC012344_4_000020 429 11265 11265 A>T 

source dbSNP gnl |dbSNP| ss507731_allele 

consequence RORA_cds.l 42,5 Intron 
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TABLE 1 (Cont.) 

consequence RORiL_cds.2 426 Intron 

consequence RORA^cds.4 427 Intron 

Allele QB:AC012344_4_000020 429 11320 11320 A>G 
source dbSNP gnl | dbSNP | ss380580_allele 

consequence RORA_cds.l 425 Intron 

consequence RORA_cds.2 426 Intron 

consequence RORA_cds.4 " 427 Intron 

GIF RORA-genomic-rev.gif 



SCRGl 

Full name : scrapie responsive protein 
Link : SCRGl_link_genoinic 

Subsequence SCRGl_cds.l 30577 33650 #435 

Subsequence GB: AC009588_4 1 164772 #436 

Subsequence SCRGl_mma_build, 1 30561 33845 #437 

CDS SCRGl_cds.l 297 bp 2 exons #435 

iexon 30577 30818 

exon 33596. 33650 
iriRNA SCRGl_nQma_build.l 508 bp 2 exons #437 

exon 30561 30818 

exon 33596 33845 
GIF SCRGl-genoinic-fwd.gif 



SGyA20 

Full name : small inducible cytokine subfamily A member 20 
Link : SCYA20_link_cdna 

Subsequence GB:HSU64197 1 821 #438 

CDS GB:HSU64197.1 288 bP #439 

ORF 43 330 

Allele GB:HSU64197 438 341 341 A>G 

isSNP SNP00037526 

GB:HSU64197.1 439 3' 

GB:HSU64197 438 728 728 A>Q 
isSNP SNP00037527 

GB:HSU64197.1 439 3' 



source 
consequence 



Allele 

source . 

consequence 
GIF SCyA20-cdna-fwd.gif 
Link : SCYA20_link_genomic 

Subsequence SCYA20_cds.l 
Subsequence 
Subsequence 



73925 77096 #440 
GB:AC027560_2 1 129588 #441 

SCYA20_mma_build.l 73883 77577 #442 



CDS SCyA20_cds.l 288 bp 

exon 73925 74000 

exon 75470 75581 

exon 76320 76397 

exon 77075 77096 

IriRNA SCYA20_jnma_build . 1 

•exon 73883 74000 

exon 75470 75581 

exon 76320 76397 

exon 77075 77577 

Allele 6B:AC027560_2 441 



811 bp 



BNSDOCID: <WO____030S4ie6A2J_> 



wo 03/054166 



PCT/US02/41225 



TABLE 1 (Cont.) 

source isSNP SNP00037526 

consequence SCYA2 0_cds . 1 .440 3' 

Allele GB:AC027560_2 441 77493 77493 A>G 

source isSNP SNP00037527 

consequence SCyA2.0_cds . 1 440 3 ' 

GIF SCYA20-genomic-fwd.gif 



SDC2 

Full name : syndecan 2 
Link : SDC2_link_cdna 

Subsequence GB:HUMHSPGC 1 

CDS GB:HUMHSPGC.2 1194 bp 

ORF 1 1194 
GBtHUMHSPGC 



#443 
#444 



Allele 



consequence 



443 435 435 
isSNP SNP00116695 
GB:Hl]MHSPGC.2 



GBrHUMHSPGC 443 



463 



463 



isSNP SNP00050825 
GB:HUMHSPGC.2 
443 741 741 
isSNP SNP00033651 
GB:HUMHSPGC.2 
3le GB:HUMHSPGC 443 1041 1041 

source isSNP SNPO 009 9428 

. consequence GB : HUMHSPGC . 2 
SDC2-cdna-fwd.gif 



source 
consequence 
GB: HUMHSPGC 
source 
cons equenc e 



444 
OG 



444 
A>G 



444 
G>T 



SDC4 

Full name : syndecan 4 
Link : FL_1394592_link_cdna 

Subsequence FN: 1394592CB1 

CDS FN:1394592CB1.1 594 bp 
ORF 23 616 

CDS QB:HS453C12_1394592CD1 594 bp 
ORF 87967 88026 
ORF 100431 100569 
ORF 103282 103328 
ORF 105787 105985 
ORF 108936 109084 

rriRNA FL_1394592_mma_build.l 
ORF 87945 88026 
ORF 100431 
ORF 103282 
ORF 105787 
ORF 108936 

Allele 



2112 
#446 



100569 
103328 
105985 
110578 

FN:1394592CB1 445 653 

source isSNP SNP00124074 

consequence FN: 139459 2CB1 . 1 
FN:1394592CB1 445 749 

source isSWP SNP00124075 

consequence FN: 1394592CB1 .1 



446 
749 



3' 
A>G 
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Allele. FN:1394592CB1 445 856 

source isSNP SNP00053065 

consequence FN: 1394592CB1 . 1 

Allele FN:1394592CB1 445 884 

source isSNP SNP00066145 

consequence FNil394592CBl.l 

Allele FN:1394592CB1 445 1048 

source isSNP SNP00066146 

consequence FN:1394592CB1.1 

Allele FN:1394592CB1 445 1214 

source isSNP SNP00029910 

consequence FN: 1394592CB1 . 1 

GIF SDC4-cdna-fwd.gif 
Link s FL_1250708_link_genomic 

Subsequence 



446 
884 



446 
1048 



446 
1214 



3' 
A>G 



Subsequence 
Subsequence 
Subsequence 
Subsequence 
Subsequence 



GB:HS453C12 1 147620 
GB:HS453C12_1394'592CD1 87967 
GB:HS453C12_2027624CD1 
FL_1 39459 2_inma_bui 1 d . 1 



#271 

109084 #272 
20194 10528 #273 
87945 110578 #274 



CDS GB:HS453C12_1394592CD1 
exon 87967 88026 
exon 100431 100569 
exon 103282 103328 
exon 105787 105985 
• exon 108936 109084 



FL_2027624_inma_build.l 20197 6152 #275 
OA21_cds.l 20194 17050 #276. 
594 bp 



inRKIA 

exon 
exon 
exon 
exon 
exon 
Allele 



FL_1394592_jnma_build.l 2110 bp 
87945 88026 
100431 



103282 
105787 
108936 
QB:HS453C12 
source 
consequence 
GB:HS453C12 
source 
consequence 
GB:HS453C12 
source 
consequence 
GB:HS453C12 



100569 
103328 
105985 
110578 

271 90320 90320 A>G 
isSNP SNP00026142 

GB:HS453C12_1394592CD1 272 Ii 
271 90420 90420 OQ 
isSNP SNP00026143 

QB:HS453C12_1394592CD1 272 I] 

271 96768 96768 A>G 

dbSNP gnl ] dbSNP | ss736312_all6le 



GB:HS453C12_1394592CD1 272 
271 109121 109121 
isSNP SNP00124074 
GB:HS453C12_1394592CD1 272 
271 109217 109217 
isSNP SNP00124075 
GB:HS453C12_1394592CD1 272 
271 109324 109324 
isSNP SNP00053065 
GB:HS453C12_1394592CD1 272 
271 109352 109352 
isSNP SNP00066145 
consequence GB:HS453C12_1394592CD1 272 
QB:HS453C12 271 109516 109516 
source isSNP SNP00066146 

252 



consequence 

GB:HS453C12 

source 

consequence 

GB:HS453C12 

source 

consequence 

GB:HS453C12 

source 



Intron 
OG 



3' 
A>G 



_03054166A2_t_> 
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consequence GB :HS453C12_1394592CD1 272 3/ 

Allele GB:HS453C12 271 109682 109682 A>G 

source isSNP SNPO 0029910 

consequence GB:HS453C12_1394592CD1 272 3' 
GIF SDC4-genomic-fwd.gif 



SEDL 

Full name : sedlin 
Link ; SEDrt_link_cdna 

Subsequence GB:NM_014563_1 1 2816 #447 

CDS GB:]SIM_014563_1.1 423 bp #448 
ORF 230 652 

Allele GB:]SIM_014563_1 447 991 991 G>T 

source dbSNP gnl | dbSNP | ss380525_allele 

source dbSNP gnl j dbSNP | ss531221_allele 

consequence GB:NM_014563_1 .1 448 3' 

Allele GB:3SIM_014563_1 447 2026 2026 A>G 

source dbSNP gnl ] dbSNP | ss637643_allele 

source dbSNP gnl j dbSNpj ss869682_allele 

source dbSNP gnl j dbSNP | ssl272499_allele 

source dbSNP gnl j dbSNP | ss232503_allele 

source dbSNP gnl j dbSNP j ss459122_allele 

consequence GB :NM_014563_1 . 1 . 448 3' 

Allele GB:]S1M_014563_1 447 2391 2391 C>G 

source isSNP SNP00010387 

consequence GB :N11_014563_1 . 1 448 3' 

GIF SiaDL-cdna-fwd.gif 



SKI 

Full name : v-ski aivian sarcoma viral oncogene homolog 
Link : SKI_linK_cdna 

Siibsequence GB:NM_Q03036 1. 

CDS QB:NM_003036.1 2187 bp 

ORF 73 2259 

GB:NM_003036 449 528 

source isSNP SNP00068450 

consequence GB:NM_003036 .1 
GB:N11_003036 449 1146 

source isSNP SNP00068451 

consec[uence GB :NM_003036 . 1 
GB:NM_003036 . 449 3482 

source isSNP SNP00068452 

consecjuence GB:NM_003036 . 1 
GIF SKI-cdna-fwd.gif 



Allele 



Allele 



Allele 



3511 
#450 



528 



450 
1146 



450 
3482 



450 



#449 



Silent 
A>G 



Silent 
OG 



SOD2 

Full name : superoxide dismutase 2, mitochondrial 
Link : SOD2_linlsL_cdna 

Subsequence EM:HSSOD 1 1026 #451 

253 
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Allele EMiHSSOD 451 243 243 

source isSNP SNP00021476 

Link : SOD2_link_genomic 

Subsequence :©I:S77127 101 12957 

Subsequence SOD2_link_,cds. 1 957 

Subsequence SOD2_nima_build. 1 953 



HiRNA 



SOD2_jnnma_build.l 1026 bp 



#452 

11597 #453 
11950 #454 
5 exons 



axon 


953 979 


exon 


1260 1462 




5859 5975 


exon 


9061 9240 




11452 11950 


CDS SOD2 


link_cds . 1 


exon 


957 979 


. exon 


1260 1462 


exon 


5859 5975 


exon 


9061 9240 


exon 


11452 11597 


Allele 


KM;S77127 




source 




source 




consequence 


Allele 


EM:S77127 




source 




consequence 


Allele 


EM:S77127 




source 




consequence 



452 1183 1183 A>G 
isSNP SNP00003080 

wetSNP EM:S77127 .vll83 .OT 

SOD2_link_cds . 1 453 Missense 
1456 1456 G>T 



16-16 A>V 



452 
wetSNP 



452 1734 1734 
isSNP SNP00107369 
SOD2_link_cds . 1 
GIF SOD2-genomic-fwd.gif 



:S77127.vl456.A>C 
L 453 Intron 
A>G 



SOD3 

Full name : superoxide disnvutase 3, extracellular 
Link : SOD3_link_cdna 

Subsequence GB--SOD3 1 1984 #455 

CDS GB:SOD3.1 723 bp #456 

ORF 664 1386 
Allele GB:SOD3 455 835 83 5 A>G 

source isSNP SNP00033027 

consequence GB:SOD3.1 456 Missense 58-58 T>A 

Allele QB:SOD3 455 874 874 A>G 

source isSNP SNPO 006243 3 

consequence QB:S0D3.1 456 Silent 71-71 L 

Allele GB:SOD3 455 1469 1469 A>G 

source isSNP SNP00067750 

consequence GB:SOD3.1 456 3' 
Allele aB:SOD3 455 1496 1496 A>G 

source isSNP SNP00007500 

consequence GB:SOD3.1 456 3' 
Allele QB:SOD3 455 1817 1817 G>T 

source isSNP SNP00104042 

consequence QB:SOD3.1 456 3' 
Allele QB:SOD3 455 1826 1826 A>G 

source isSNP SNP0003J^10 
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Allele 



A>G 



consequence GB:SOD3.1 456 
GB:SOD3 455 1932 1932 

source isSNP SNP00050239 

conseqpience GB : SOD3 . 1 456 3 ' 
GIF SOD3-cc3na-fwd.gif 
Link : Fl4_1534327_link_genomic 

Subsequence GB:HSU10116 1 10079 #457 

Subsequence GB :HSU10116_1534327CD1 5085 

Subsequence FL_1534327_inma_build, 1 1130 

iriRNA FL„1534327_inrna_build.l 1427 bp 

axon 1130 1219 
exon 5069 6405 
CDS GB:HSU10116_1534327CD1 723 bp 1 exon 

exon 5085 5807 



5807 #458 
6405 #459 
2 exons 



GB:HSU10116 
source 
consequence 
GB:HSU10116 



consequence 
GB:HSU10116 
source 
consequence 
GB:HSU10116 
source 



457 5256 5256 A>G 
isSNP SNP00033027 
GB:HSU10116_1534327CD1 
457 5295 5295 A>G 
isSNP SNP00062433 
GB:HSU10116_1534327CD1 
457 5890 5890 A>G 
isSNP SNP00067750 
GB:HSU10116_1534327CD1 
457 5917 5917 A>G 
isSNP SNP00007500 
consequence GB :HSU10116_1534327CDl 
GB:HSU10116 457 6238 6238 G>T 
source isSNP SNP00104042 

consequence GB :HSU10116_1534327CD1 
GB:HSU10116 457 6247 6247 A>Q 
source isSNP SNP00031110 

consequence GB :HSU10116_1534327CT>1 
GB:HSU10116 457 6353 6353 A>G 
source isSNP SNP00050239 

consequence GB:HSU10116_^1534327CD1 
GIF SOD3-genomic-fwd.gif 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



Allele 



SOX9 

Full name : SOX9 
Link : S0X9_link_cdna 

Subsequence GB:HSSOX9MRN 1 

CDS GB:HSSOX9MRN.2 1530 bp 

ORF 360 1889 
Allele GB:HSSOX9MRN 460 866 

source isSNP SNP00092616 

consequence GB :HSSOX9MRN. 2 
Allele GB:HSSOX9MRN 460 1571 

source isSNP SNPOOl 08001 

consequence GB : HSSOX9MRN . 2 
Allele GBSHSSOX9MKN 460 1912 • 

source isSNP SNP00055269 

consequence GB:HSSOX9MRN.2 
Allele GB:HSSOX9MPN 460 2374 

255 



3923 #460 
#461 



461 Silent 
1571 A>G 



461 Silent 
1912 G>T 



461 3' 
2374 A>Q 
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461 
3224 



461 
3470 



461 



OQ 



A>G 



. source . isSNP SNP00041454 

consequence GB tHSSOXSMRN. 2 
Allele GB:HSSOX9MKN 460 3224 

source isSNP SNP00061027 

consequence aB:HSSOX9MRN. 2 
Allele GB:HSSOX9MRN 460 3470 

source isSNP SNP00Q55270 

consequence GB : HSSOX9MKN . 2 
GIF SOX9-cdna-fwd.gif. 
Link : FL_5425567_link_genomic 

Subsequence GB: AC007461_8_5425567CD1 63884 60889 #462 

Subsequence GB : AC007461_8 1" 180385 #463 

Subsequence SOX9_inma_build . 1 64243 58856 #464 

CDS GB:AC007461_8_5425567CD1 1530 bp 3 exons #462 

exon 63884 63454 
exon 62557 62304 
exon 61733 50889 
niRNA SOX9_inma_build . 1 3922 bp 3 exons #464 

exon 64243 53454 
exon 62557 62304 
exon 61733 58856 
Allele GB:AC007461_8 463 59309 59309 A>G 

source isSNP SNP00055270 

consequence QB: AC0b7461_8_5425567CDl 462 3' 

Allele GB:AC00746L8 463 59555 59555 OG 

source isSNP SNP00061027 

consequence GB:AC007461_8_5425567CD1 462 3' 

Allele QB:AC007461_8 463 60078 60078 A>Q 

source isSNP SNP00010889 

consequence GB:AC007461_8_5425567CDl 462 3/ 

Allele GB:AC007461_8 '463 60404 60404 A>G 

source isSNP SNP00041454 

consequence GB: AC007461_8_5425567CD1 462 3' 

Allele GB:AC007461_8 463 60866 60866 G>T 

source isSNP SNP00055269 

consequence GB : AC007461_8_5425567CDl 462 3' 

Allele GB:AC007461_8 453 61207 61207 A>G 

' source isSNP SNP00108001 

consequence GB: AC007461_8_5425567CDl 462 Silei 



404-404 
Allele 



GB:AC007461_8 463 62482 62482 A>G 

source isSNP SNP00092615 

source wetSNP GB: AC007461_8 . v62482 .G>A 

consequence GB:AC007461_8_5425567CD1 462 
H 



GIF SOX9-genomic-rev.gif 



STATI2 

Full name : STAT- induced STAT inhibit or-2 

Link s FL_2787l40_link_cdna 

Subsequence FN: 2787140CB1 1 2587 #465 

CDS FN:2787140CB1.1 927 bp #466 
ORF 98 1024 5- 
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Allele FN:2787140CB1 465 1325 

source isSNP SNF00041483 

consequence FN: 2787140CB1 . 1 

Allele FN:2787140CB1 . 465 1442 

source isSNP SNP00106962 

consequence FN: 2787140CB1 . 1 

Allele PN:2787140CB1 465 1470 

source isSNP SNP00041484 

consequence FN: 2787140CB1 . 1 

Allele FN:2787140CB1 465 1974 

source isSNP SNP00106963 

consequence FN: 2787140CB1 . 1 

GIF STATI2-cdna-fwd.gif 
Link : FL_1405668_link_genoiiiic 

Subsequence GB : AC012085_1 1 

Subsequence FL_2787140_inrna_j3uild. 1 



466 
1442 



466 
1470 



466 
1974 



3' 

G>T 



3' 
A>G 



177866 
42013 47745 



#467 
#468 



rtiRNA FL_2787140_inma_build.i 2580 bp 3 ex< 

exon 42013 42225 
axon 43694 44045 
exon 45731 47745 

Allele . GB:AC012085_1 467 44268 44268 A>G 

source isSNP SNP00070304 

Allele GB:AC012085_1 467 46492 46492 A>G 

source isSNP SNP00041483 

Allele GB:AC012085_1 457 46609 46609 G>T 

source isSNP SNP00106962 

Allele GB:AC012085_1 467 46637 46637 A>Q 

source isSNP SNP00041484 

Allele GB:AC012085_1 467 47141 47141 A>G 

source isSNP SNP00106963 

GIF STATI2-genoinic-fwd.gif 



THBSl 

Full name : throrribospondin 1 
Link : THBSl_link_cdna 

Subsequence GB : HSTS 

CDS GB:HSTS.l 3513 bp 

ORF 
Allele 



5722 
#470 



112 3624 
















GB:HSTS 


469 


1239 


1239 


A>G 








source 


isSNP 


SNP00046537 










consequence 


GB : HSTS . 1 


470 


Silent 


376- 


-376 


D 


QB:HSTS . 


469 


2210 


2210 


A>G 








source 


isSNP 


SNPOO 


045539 










consequence 


GB:HSTS.l 


470 


Missense 


700- 


-700 


N>S 


GB:HSTS 


469 


2979 


2979 


A>G 








source 


isSNP 


SNPOO 


061983 










consequence 


GB:HSTS.l 


470 


Silent 


956- 


-956 . 


D 


GB : HSTS 


459 


3680 


3680 


G>T 








source 


isSNP 


SNPb01085l4 










consequence 


GB:HSTS.l 


470 


3 ' 








GB:HSTS 


469 


3703 


3703 


A>G 








source 


isSNP 


SNP00013197 










consequence 


GB:HSTS.l 


470 
257 


3' 
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TABLE 1 (Cont.) 

Allele GBrHSTS 469 3905 3905 A>G 

source isSNP SNP00093327 

consequence GBrHSTS. 1 470 3' 

Allele GBrHSTS 469 5259 5259 A>G 

source isSNP SNP00105437 

consequence GBrHSTS.l 470 3' 

GIF THBSl-cdna-fwd.gif 



TIMPl 

Full name : Tissue Inhibitor of Metalloproteinase 1 
Link : TIMPl_link_cdna 

Subsequence FN:411388CB1 1 853 #471 

CDS FNr4113 88CBl.l 621 bp #472 

ORF 122 742 
Allele FNr411388CBl 471 355 365 OG 

source isSNP SNP00115174 

consequence FN: 411388CBl.i 472 Missense 82-82 

GIF TIMPl-cdna-fwd.gif 
Link r FL_3013907_link_genoinic 

Subsequence GB:HS230G1 1 125515 #473 

S\absequence GB :HS230G1_411388CD1 . 20559 17287 #474 

Subsequence TIMPl_mma_build. 1 21613 17186 #475 

iriRNA TIMPl_inma_build.l 843 bp 6 exons #475 

21613 21501 



exon 
exon 
exon 
exon 
exon 
exon 



20567 20439 
19039 18960 
18770 18644 
18432 18308 
17454 17186 
CaDS GB:HS230Gl_411388Ca31 
exon 20559 20439 
exon 19039 18960 
exon 18770 18644 
exon 18432 18308 
exon 17454 17287 . 
Allele GB:HS230Q1 
source 
consequence 



621 bp 



473 

wetSNP 

GB : HS23 0Gl_4113 88CD1 



17434 17434 A>G 

GB :HS230Q1 . vl7434 .G>A 



I 

Allele 



GB:HS230G1 

source 

consequence 

GB:HS230G1 

source 

consequence 

6B:HS230G1 

source 

consequence 

GBrHS230Gl 

source 



473 17550 17550 A>G 
isSNP SNP00099224 

GB:HS23 0Gl_411388eDl. 474 IntJ 

473 18046 18046 A>G 
isSNP SNP00099223 

GB:HS230G1_411388CD1 474 Int3 

473 18088 18088 A>G 
isSNP SNP00030937 

GB:HS230G1_411388CD1 474 Int: 

473 18389 18389 A>Q 

wetSNP GB:HS230Gl.vl8389.A>G 



consequence QB:HS230G1_4113 88CD1 



474 



QB:HS230Gl 473 
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source isSNP SNP00099222 

source wetSNP GB:HS230G1 . vl8495 .C>Q 

consequence GB:HS230G1_411388CD1 474 Intron 

Allele . GB:HS230G1 473 18711 18711 A>G 

source wetSNP GB:HS230Gl.vl8711.G>A 

consequence GB:HS230Gl_411388CDl 474 Silent 

Allele GB:HS23qGl 473 18728 18728 OG 

source isSNP SNP00115174 

consequence GB :HS230al_411388CDl 474 Missens 

GIF TlMPl-genomic-rev.gif 



82-82 R>G 



TIMP2 

Full name : Tissue Inhibitor of Metalloproteinase-2 . 
Link : TIMP2_link_genomic 
Subs equence 
Subsequence 
Subsequence 
Subsequence 
Subsequence 
Subsequence 
Subsequence 
CDS TIMP2_cds, 
exon 822 
exon 
exon 
exon 
exon 



TIMP2_cds . 1 


822 


3126 


#476 


GB:S68860_1 


1 


970 


#477 


GB:U44382_1 


1071 


1320 


#478 


GB:U44383_1 


1421 


1644 


#479 


GB:U44384_1 


1745 


2283 


#480 


GB:U443B5_1 


2384 


3750 


#481 


TIMP2_mma_build . 1 




810 


663 bp 


5 exons 


#476 



1125 
1504 
1939 
2929 



951 
1225 
1612 
2063 
3126 

mRNA TIMP2_jcnma_bui Id . 1 

exon 810 951 

exon 1125 1225 

exon 1504 1612 

exon 1939 2063 

exon 2929 3251 
Allele GB:U44383_1 479 155 
source wetSNP 



800 bp 



3251 #482 



155 A>G 

GB :U443 83_1 . vl5 5 . G>A 



consequence TIMP2_cds . 1 
GIF TIMP2-genoniic-fwd.gif 



TNA 

Full name : tetranectin 
Link : TNA_link_cdna 

Subsequence GB:NM_003278 1 874 #483 

CDS GB:NM_003278.1 609 bp #484 

ORF 94 702 
Allele GB:NM_DD3278 483- 409 409 A>G 

source isSNP SNP00007942 

consequence GB:NM_0Q327B . 1 484 Missense 106-106 S>G 

Allele GB:NM_003278 483 744 744 .A>G 

source isSNP SNP000b7943 

consequence GB:NM_003278.1 484 3' 

GIF TNA-cdna-fwd.gif 
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TABLE 1 (Cont.) 



Link : TNA_link_genoinic 



Subsequence 
Subsequence 
Subsequence 
Subsequence 
Siibsequence 
Subsequence 
CDS TNA_cds.l 
exon 254 
exon 829 
exon 1229 
CDS TNA_cds . 2 
exon 254 
exon 

xaBNh 

exon 
exon 
exon 
Allele 



TNA_cds.l 254 
TNA_cds.2 254 
GB:X70910_1 1 
GB:X70911_1 671 
GB:X70912_1 1079 
TNAjnma_^bui Id . 1 
609 bp 
362 
927 
1629 
510 bp 
362 
1629 

TNA_inma_bu i 1 d . 1 
164 362 
829 927 
1229 1776 
GB:X70912_1 489 



1629 

1629 

570 

978 

1805 

164 



#485 
#486 
#487 
#488 
#489 
1776 
#485 



1229 



258 258 
isSNP SNP00007942 
consequence TNA_cds.l 485 
consequence TNA_cds.2 486 
Allele GB:X70912_1 489 593 593 

source isSNP SNP00007943 

consequence TNA_cds . 1 485 
consequence TNA_cds.2 486 
GIF TNA-genomic- fwd . gi f 



Mlssense 
Mlssense 
A>G 



106-106 
73-73 S>G 



alpha- induced protein 6 



TNFAIP6 

Full name : tumor necrosis factor. 
Link : TNFAIP6_link_cdna 

Subsequence GB :NM_007115_1 1 1414 

CDS GB:NM_007115_1.1 834 bp #492 

ORF 69 902 
Allele aB:NM_007115_l 491 499 499 

source isSNP SNP00040822 

consequence GB :NM_007115_1 . 1 492 
Allele GB:NM_007115_1 491 1143 1143 

source isSNP SNP00040823 

consequence GB :NM_007115_1 . 1 492 
GIF TNFAIP6-cdna-fwd.gif 
Link : FL_1000909_link_genoinic 

Subsequence GB: AC009311_1_191918CD1 

Subsequence GB: AC009311_1 1 

Subsequence TNFAIP6_inma_build. 
itiRNA 



Missense 
OG 



132384 
160198 
132314 



154250 

#494 

154760 





TNFAIP6_TOma 


_build.l 


exon 


132314 


132477 


exon 


138660 


138797 


exon 


140773 


140934 


exon 


144737 


14-4965 


exon 


148266 


148306 


exon 


154081 


154760 



1414 bp 



CDS GB:AC009311_1_191918CD1 834 bp , 
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exon 132384 



exon 
exon 
exon 
exon 
exon 
Allele 



A>T 
Allele 



Q>R 
Allele 



132477 
138797 
140934 
144965 
148306 
154250 



138660 
140773 
144737 
148266 
154081 
QB:AC009311_1 494 
source wetSNP 
consequence GB:AC009311. 



140934 

aB:AC009311_] 
.1_191918CD1 



GB:AC009311_ 



494 140942 
wetSNP GB:AC009311_ 
consequence GB : AC009311_l_191918'CDl 
GB:AC009311_1 494 144773 

source isSNP SNP00040822 

source wetSNP GB:AC009311. 

consequence QB: AC009311_l_191918CDl 



140934 A>Q 
.l,vl40934.G>A 
493 Missense 

140942 A>T 
l.vl40942.A>T 
493 Infcron 
144773 A>G 

.l.vl44773.A>G 
493 Missense 



GB:AC009311_ 

source 

consequence 

GB:AC009311. 

source 

consequence 

GB: AGO 09 3 11. 

source 

consequence 

GB:AC009311. 

source 

consequence 



1 494 148030 

dbSNP gnl |dbSNP| SS645109. 
GB:AC009311_1_191918CD1 

1 494 148229 

wetSNP GB:AC009311_ 
GB : AC009311_1_191918CD1 

.1 494 148245 

wetSNP GB:AC009311_ 
GB:AC009311_1_191918C!D1 

.1 494 154493 

isSNP SNP00040823 
QB:AC009311_1_191918CD1 



148030 A>G 

_allele 
493 Intron 
148229 A>G 

l.vl48229.T>C 
493 Intron 
148245 A>G 

l,vl48245.T>C 
493 Intron 
154493 C>G 



GIF TNFAIP6-genoiaic-fwd.gif 



TNFRSFllB 

Full name : TNFRSFllB 
Link : TNFRSFllB_link_cdna 

Subsequence GB:AB002146 1 

CDS GB:AB002146.1 1206 bp 

ORF 1 1206 
Allele GB:AB002.146 
source 
consequence 
GIF TNFRSFllB-cdna-fwd.gif 
Link : TNFRSFllB_link_genomic 

Subsequence TNFRSFliB_cds . ] 

Subsequence GB:E15270_1 1 

CDS TNFRSFllB_cds.l 1176 bp 



#496 
#497 



496 768 768 
isSNP SNP00028816 
GB:AB002146.1 



125 
9898 



9057 
#499 



exon 
exon 
exon 
exon 
Allele 



130 
4504 
6716 
8669 



499 
4695 
6940 
9057 



GB:E15270_1 
source 



499 503 
wetSNP 



503 A>G 

GB:El5270_l.v503.C>T 



consequence TNFRSFllB_Gds^l_^ 
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TABLE 1 (Cont.) 

Allele GB:E15270_1 499 4499 4499 A>G 

source wetSNP QB:El5270_l.v4499 .C>T 

consequence TNFRSFllB_cds . 1 498 Intron 

Allele . GB:E15270_1 499 4661 4661 A>G 

source wetSNP " QB:E1527d_l .v4661 .OT 

consequence TNFRSFllB_cds . 1 498 Silent 176-176 

Allele GB:E15270_1 499 4749 4752 TCTG>TG . 

source wetSNP 6B : E1527 0_1 . v4749 . TCTG>TG 

consequence TNFRSFllB_cds. 1 498 Intron 

Allele QB:E15270_1 499 6599 6599 A>G - 

source wetSNP GB:E15270_l,v6599 .G>A 

consequence TNFRSFllB_cds . 1 49 8 Intron 

Allele GB:E15270_1 499 6837 6837 A>G 

source wetSNP GB:E15270_1 .v6837 .G>A 

consequence TNFRSFllB_cds . 1 498 Silent 228-228 

Allele GB:E15270_1 499 6891 6891 A>G 

source isSNP SNP00028816 

consequence TKFRSFllB_cds . 1 498 Silent 246-246 
GIF TNFRSFllB-genoinic-fwd.gif 
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"What is claimed is: 

1 . A method of determinitig susceptibility of an individual to joiat space narrowing and/or 
osteophyte development and/or joint pain comprising identifying whether the individual has at least one 

5 polymorphism in a polynucleotide encoding at least one of the proteins listed in Table 1 . 

2. The method of claim 1, wherein said proteins listed in Table 1 are selected from the group 
consisting of bone morphogenic :f;in 2 (BMP2), cartilage intermediate layer protein (CILP), 
cartilage oligomeric matrix proteh . 3MP), tissue inhibitor of metaUoproteinase 1 (TIMPl), 

10 tetranectin (TNA), matrix metaUcpv. isanase 3 0MMP3), and prostaglandin-endoperoxide synthase 2 
(PTGS2). 

3. The nlethod of claim 1, wherein the joint space narrowing and/or osteophyte development 
and/or joint pain is associated with a disease. 

15 

4. The method of claim 3 wherein the disease is osteoarthritis. 

5. The method of claim i where at least one of the polymorphisms is selected from the 
polymorphisms listed in Table 1. 

20 

6. The method of claim 1 comprising contacting a sample from the individual with a specific 
binding agent for the polymorphism and deterniining whether the agent binds to the polymorphism. 

7. The method of claim 1 where the polymorphism in the polynucleotide is determined for 
25 more than one allele of the individual 

8. A method for modulating the susceptibility of an individual to joint space narrowing and/or 
osteophyte development and/or joint pain, comprising identifying the individual by the method of claim 
1 and administering to the individual a composition comprising an effective amount of an agent which 

30 modulates said susceptibility. 

9. The method of claim 8, whereitt the joint space narrowing and/or osteophyte development 
and/or joint pain is associated witii a disease, 
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10. The method of claitn 9 wherein the disease is osteoarthritis. 

11. A polymicleotide encoding a protein listed in Table 1 having at least one polymorphism in 
the polynucleotide selected from the group of polymorphisms listed in Table 1 for the polynucleotide. 

5 

12. A fragment of a polynucleotide encoding a protein selected from Table 1 having at least 
one polymorphism in the fragment selected from the group of polymorphisms listed in Table 1. 

13. A fragment of claim 12 having a length of 8 to 100 nucleotides. 

10 

14. A fragment of claim 12 having a length of 8 to 30 nucleotides. 

15. A fragment of claim 12 having a length of 9 to 15 nucleotides. 

16. A method of identifying an agent for modulating susceptibility of an individual to joint 
space narrowing and/or osteophyte development and/or joint pain comprising: 

a) contacting a test agent with a polypeptide or a polynucleotide encoding the polypeptide 
selected from the list of Table 1 having at least one of the polymorphisms selected from the list of 
Table 1, 

b) determining whether the agent is capable of binding to the polypeptide or polynucleotide 
encoding the polj^eptide, and 

c) determining whether the activity or expression of the polypeptide or polynucleotide 
encoding the polypeptide is modulated. 

25 17. A method of formulating a composition comprising 

a) identifying an agent for modulating the susceptibility of an individual to joint space 
narrowing and/or osteophyte development and/or joint pain by the method of claim 16, and 

b) formulating the agent with a carrier or dfluent. 

30 18. An agent identified by the metiiod of claim 1 6. 
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19. A composition for modulating the susceptibiHty of an individual to joint space 
and/or osteophyte development and/or joint pain comprising an agent according to claim 18 and a 
carrier. 

5 20. A method comprising using an agent of claim 1 8 in the manufacture of a medicament for 

modulating susceptibility to joint space narrowing and/or osteophyte development and/or joint pain. 

21. A probe, primer or antibody which is capable of selectively detecting a polymorphism 
listed in Table 1 which is associated with susceptibility to joint space narrowing and/or osteophyte 

10 development and/or joint pain. 

22. A vector comprising the polymicleotide of claim 11. 

23. A host cell line compiisiag the vector of claim 22. 

24. A nonhuman animal which is transgenic for the polynucleotide of claim 11. 

25. A cellline comprising the polynucleotide of claim 11. 

20 26. A method of using a cell line of claim 25 to screen for an agent for diagnosis of an 

individual having susceptibiUty to joint space narrowing and/or osteophyte development and/or joint 
pain. 

27. A method of using a nonhuman animal of claim 24 to screen for an agent for diagnosis of 
25 an individual having susceptibility to joint space narrowing and/or osteophyte development and/or joint 

pain. 

28. A kit for diagnosis of an individual having susceptibility to joint space narrowing and/or 
osteophyte development and/or joint pain comprising an agent for detection of flie polynucleotide of 

30 claim 11. 

29. The kit of claim 28 further comprising instmction for use of said agent for detection of 
said polynucleotide. 
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30. A kit for diagnosis of an individual having susceptibility to joint space narrowing and/or 
osteophyte development and/or joint pain comprising an agent for detection of the fragment of a 
polynucleotide of claim 12. 

31. The kit of claim 30 further comprising instructions for use of said agent for detection of 
said fragment 

32. A kit for diagnosis of an individual having susceptibility to joint space narrowing and/or 
osteophyte development and/or jomt pain comprising the probe, primer or antibody of claim 21. 

33. The kit of claim 32 further comprising instructions for use of said probe, primer or 
antibody. 
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